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ABSTRACT 

We extend RDF with the ability to represent property val- 
ues that exist, but are unknown or partially known, using 
constraints. Following ideas from the incomplete informa- 
tion literature, we develop a semantics for this extension of 
RDF, called RDF 1 , and study SPARQL query evaluation in 
this framework. 

1. INTRODUCTION 

Incomplete information has been studied in-depth in rela- 
tional databases [11, 5] and knowledge representation. It is 
also an important issue in Semantic Web frameworks such as 
RDF, description logics, and OWL 2 especially given that all 
these systems rely on the Open World Assumption (OWA). 
Making the OWA means that we cannot capture negative 
information implicitly, i.e., if a formula (j> is not entailed by 
our knowledge base, we cannot assume its negation as in the 
Closed World Assumption (CWA). 

Application knowledge captured by databases and knowl- 
edge bases is often incomplete, thus the OWA is a useful as- 
sumption to make. In general, the richer an application do- 
main is, the more possible it is that a framework based on in- 
complete information will be required. Incomplete informa- 
tion can also arise even if we start from complete databases, 
e.g., in relational view updates, data integration, data ex- 
change, etc., thus the detailed study of incomplete informa- 
tion has been a recurring theme in the literature throughout 
the years. 

In the context of the Web, incomplete information has re- 
cently been studied in detail for XML [1, 4]. As Semantic 
Web technologies achieve maturity and gain acceptance in 
a wide variety of application domains through the creation 
of ontologies and linked data pools, we expect the study of 
issues related to incomplete information to gain more atten- 
tion in the Semantic Web community as well. There have 
been some recent papers that confirm our expectations. 

[8] introduces the concept of anonymous timestamps in 
general temporal RDF graphs, i.e., graphs containing quads 
of the form (s, p, o) [t] where t is a timestamp (a natural num- 
ber) or an anonymous timestamp x stating that the triple 
(s,p,o) is valid in some unknown time point x. [10] subse- 
quently extends the concept of general temporal RDF graphs 
of [8] so that one is allowed to express temporal constraints 
involving anonymous timestamps using a formula (p which 
is a conjunction of order constraints xi OP X2 where OP is 
an arithmetic comparison operator such as <, <, etc. [10] 
calls c-temporal graphs the resulting pairs (G, 4>) where G 
is a general temporal RDF graph and is a conjunction of 



constraints. [10] defines a semantics for c-temporal graphs 
and studies the relevant problem of entailment. 

More recently, [3] examines the question of whether 
SPARQL is an appropriate language for RDF given the 
OWA typically associated with the framework. It defines 
a certain answer semantics for SPARQL query evaluation 
based on well-known ideas from incomplete information re- 
search. According to this semantics, if G is an RDF graph 
then evaluating a SPARQL query q over G is defined as 
evaluating q over all graphs H D G that are possible ex- 
tensions of G according to the OWA, and then taking the 
intersection of all answers. [3] shows that if we evaluate a 
monotone graph pattern (e.g., one using only the operators 
AND, UNION, and FILTER) using the well-known W3C 
semantics, we get the same result we would get if we used 
the certain answer semantics. The converse also holds, thus 
monotone SPARQL graph patterns are exactly the ones that 
have this nice property. However, OPTIONAL (OPT) is not 
a monotone operator and the two semantics do not coincide 
for it. [3] defines the notion of weak monotonicity that ap- 
pears to capture the intuition behind OPT, and shows that 
a SPARQL query q is weakly monotone if and only if eval- 
uating q under the W3C semantics gives the same result as 
evaluating q under a new semantics appropriate for weakly 
monotone queries. Finally, [3] shows that the fragment of 
SPARQL consisting of the well-designed graph patterns de- 
fined originally in [21] is weakly monotone. 

1.1 Contributions 

In this paper we continue the line of research started by [8, 
10, 3] and study in a general way an important kind of in- 
complete information that has so far been ignored in the 
context of RDF. Our contributions are the following. 

First, we extend RDF with the ability to define a new kind 
of literals for each datatype. These literals will be called e- 
literals ("e" comes from the word "existential") and can be 
used to represent values of properties that exist but are un- 
known or partially known. Such information is abundant 
in recent applications where RDF is being used (e.g., sen- 
sor networks, the modeling of geospatial information, etc.). 
In the proposed extension of RDF, called RDF 1 (where "i" 
stands for "incomplete") , e-literals are allowed to appear only 
in the object position of triples. 

Previous research on incomplete information in databases 
and knowledge representation has shown that in many appli- 
cations, having the ability to state constraints about values 
that are only partially known is a very desirable feature and 
leads to the development of very expressive formalisms [5, 



13]. In the spirit of this tradition, RDF 1 allows partial infor- 
mation regarding property values represented by e-literals 
to be expressed by a quantifier-free formula of a first-order 
constraint language C. Thus, RDF 1 extends the concept of 
an RDF graph to the concept of an RDF 1 database which is 
a pair (G, 4>) where G is an RDF graph possibly containing 
triples with e-literals in their object positions, and <f> is a 
quantifier-free formula of C. 

Following ideas from the incomplete information litera- 
ture [11, 5], we develop a semantics for RDF 1 databases and 
SPARQL query evaluation. The semantics defines the set 
of possible RDF graphs corresponding to an RDF 1 database 
and the fundamental concept of certain answer for SPARQL 
query evaluation over an RDF 1 database. We transfer the 
well-known concept of representation system from the sem- 
inal paper of [11] to the case of RDF 1 , and show that CON- 
STRUCT queries without blank nodes in their templates 
and using only operators AND, UNION, and FILTER or 
the restricted fragment of graph patterns corresponding to 
the well-designed patterns of [3] can be used to define a 
representation system for RDF 1 . Our results for the mono- 
tonicity of CONSTRUCT queries (even in the case of well- 
designed patterns that contain operator OPT) indicate their 
importance and make them an interesting topic of theoreti- 
cal treatments of RDF. 

We define the fundamental concept of certain answer to 
SPARQL queries over RDF 1 databases and present an algo- 
rithm for its computation. We demonstrate the usefulness 
of RDF 1 in geospatial Semantic Web applications by consid- 
ering appropriate languages C allowing us to express binary 
spatial relations among regions in the plane [23]. We chose 
this example domain since it contains many natural prob- 
lems in which incomplete information representations are 
useful. In addition, the availability of geospatial data in the 
linked data cloud is rising rapidly, therefore good data mod- 
els and languages are needed for dealing with them [14, 18] . 
Finally, we present preliminary complexity results for the 
problem of certain answer computation by concentrating on 
spatial constraints only. 

The organization of the paper is as follows. Section 2 in- 
troduces RDF 1 by giving examples and comparing it with 
well-known concepts of the incomplete relational database 
literature. Section 3 presents some useful constraint lan- 
guages that can be used in RDF 1 to model incomplete infor- 
mation for geospatial applications. Section 4 presents RDF 1 
and then Section 5 defines its semantics. Section 6 defines 
the evaluation of SPARQL queries over RDF 1 databases. 
Section 7 presents fragments of SPARQL that can be used 
to define a representation system for RDF 1 . Section 8 gives 
an algorithm for computing the certain answer for SPARQL 
queries over RDF 1 databases. In addition, it gives some easy 
upper bounds for the data complexity of evaluating CON- 
STRUCT queries over RDF 1 databases with spatial con- 
straints. Sections 9 and 10 discuss related and future work 
respectively 

2. MOTIVATION 

Incomplete information is often present in applications, 
e.g., geospatial ones where data is imprecise, indefinite, or 
qualitative. For example, in the FP7 European project 
TELEIOS satellite images are used for environmental dis- 
aster monitoring (e.g., fires, floods). The following is a list 
of triples (namespaces are omitted) that gives an example 



hotspotl type Hotspot. ^^^S 
firel type Fire. 

hotspotl correspondsTo firel. a ' ' 

firel occuredln _R1. 

_R1 NTPP "x>6Ax<23Ay>8Ay< 19" 4^ 1 n sna 

Figure 1: (a) An RDF 1 database, (b) rectangles men- 
tioned in the examples 

of the kind of representation employed in TELEIOS for rep- 
resenting pixels of a satellite image (called hotspots) corre- 
sponding to geographic regions that are probably on fire. 

hotspotl type Hotspot . firel type Fire . 

hotspotl correspondsTo firel . firel occuredln pointl . 
pointl hasGeometry 

"x = 24.825668 A y = 35.310643""SemiLinearPointSet . 

The above set of triples is a graph in the model stRDF 
[14] which extends RDF with the ability to represent geome- 
tries over Q fe that change over time following the paradigm 
of constraint databases. In stRDF, geometries and valid 
times of triples are expressed using Boolean combinations 
of linear constraints that are given as literals of type Semi- 
LinearPointSet defined in [14]. Semi-linear point sets are 
the subsets of Q fe defined by Boolean combinations of linear 
constraints. The above graph represents definite informa- 
tion; it states that there is a hotspot corresponding to a fire 
taking place at the point (24.825668,35.310643) G Q 2 . 

Due to the medium resolution of the satellite images, each 
image pixel representing a hotspot corresponds to a 3km by 
3km rectangle in geographic space. Thus, a more appro- 
priate representation of the real world situation that corre- 
sponds to a hotspot would be to state that there is a geo- 
graphic region with unknown exact coordinates where a fire 
is taking place, and that region is included in a known 3km 
by 3km rectangle. Figure 1(a) shows an RDF 1 database for 
this representation. Fire firel (red area of Figure 1(b)) is 
asserted to have taken place inside region Rl. _R1 is an 
e-literal of datatype SemiLinearPointSet and is asserted 
to be inside the rectangle formed by the points (6, 8) and 
(23,19) (rectangle P of Figure 1(b)). This is stated with 
a constraint expressed in the language PCL (to be defined 
later). NTPP is the "non-tangential-proper-part" relation of 
RCC-8 [23] . Constraints in PCL can express qualitative and 
quantitative spatial information about regions in Q 2 . 

E-literals are like existentially quantified variables in first- 
order logic or Skolem constants. However, e-literals are dif- 
ferent from blank nodes and this is captured formally by our 
definitions in Section 4. A similar assumption is made by [8, 
10] where anonymous nodes are taken to be different from 
blank nodes. RDF 1 databases like the one of Figure 1(a) con- 
sist of two parts: a graph (i.e., a set of triples) and a global 
constraint. Global constraints can in general be quantifier- 
free formulae of some first-order constraint language. RDF 1 
databases are syntactic devices for the representation of in- 
complete information. 

Example 2.1. Let us consider the guery "Find all fires 
that have occurred in a region which is a non-tangential 
proper part of rectangle Q\ of Figure 1(b)" over the database 
of Figure 1(a). Using the algebraic syntax of Section 6 for 
SELECT queries in SPARQL, this query can be expressed 
as follows: 



({?F}, (?F, type, Fire) AND (?F, occuredln, ?R) 

FILTER (?R NTPP "x > 10 A x < 21 A y > 12 A y < 17")) 

If we examine the database of Figure 1(a), we can see that 
the answer should be conditional [11]. We cannot say for 
sure whether fir el satisfies the requirements of the query 
because the information in the database is indefinite (the 
exact geometry of _R1 is not known). Fire firel qualifies 
only in the possible graphs where _R1 is a non-tangential 
proper part of the rectangle mentioned in the query. For 
every object that qualifies as an answer, the query answer- 
ing procedure should also provide a condition characterizing 
this set of possible graphs. Following the ideas of conditional 
tables from [11], this answer can be represented by the fol- 
lowing set of conditional mappings: 

{({?F -y firel}, 

_R1 NTPP "x > 10 A x < 21 A y > 12 A y < 17")} 

Conditional mappings are different from standard 
SPARQL mappings [21] in the sense that they map variables 
to constants only if a condition holds. Thus, they are remi- 
niscent of conditional tuples in the conditional table model 
of [5]. 

Example 2.2. // we wanted to have an RDF 1 database 
as the answer to a query like the one of Example 2.1, then 
we would have queried the RDF 1 database of Figure 1(a) 
using the CONSTRUCT query form of SPARQL. Using 
the algebraic syntax of CONSTRUCT, the query would be 
expressed as follows: 

({(?F, type, Fire)}, (?F, type, Fire) AND (?F, occuredln, ?R) 
FILTER (?R NTPP "x > 10 A x < 21 A y > 12 A y < 17")) 

The answer to this query would be an RDF 1 database con- 
taining conditional triples adhering to the query template 
(i.e., {(?F, type, Fire)}). The template is instantiated for 
each conditional mapping from the evaluation of the graph 
pattern of the query, and the resulting triple together with 
the condition of the mapping form a conditional triple in 
the resulting database. Therefore, the answer to query of 
Example 2.2 consists of the following conditional triple: 

((firel type Fire), 

_R1 NTPP "x > 10 A x < 21 A y > 12 A y < 17") 

In some cases the user might know that the information 
in the database is incomplete. Thus, she might wish to find 
all values that certainly satisfy some qualification (this is 
the well-known notion of certain answer in the incomplete 
databases literature [5]). Let us consider the previous query 
again. If we rephrase it to "Find fires that have certainly oc- 
curred in a region which is a non-tangential proper part of 
rectangle Q 2 of Figure 1(b)", then firel satisfies the query 
unconditionally and the certain answer is the set of map- 
pings {{?F -> firel}}. 

3. CONSTRAINT LANGUAGES 

We will consider many-sorted first-order languages, struc- 
tures, and theories. Every language C will be interpreted 
over a fixed structure, called the intended structure, which 
will be denoted by M£. If M£ is a structure then Th{M.c) 
will denote the theory of M£, i.e., the set of sentences of C 
that are true in M£. For every language C, we will distin- 
guish a class of quantifier free formulae called C- constraints. 
The atomic formulae of C will be included in the class 



of /^-constraints. There will also be two distinguished C- 
constraints true and false with obvious semantics. 

Every first-order language C we consider has a distin- 
guished equality predicate, denoted by EQ, that has the 
standard semantics. The class of /^-constraints is assumed 
to: a) contain all formulae ti EQ i 2 where t\ , t 2 are terms of 
C, and 6) be weakly closed under negation, i.e., the negation 
of every /^-constraint is equivalent to a disjunction of C- 
constraints. This property is needed in Section 8 where the 
certain answer to a SPARQL query over an RDF 1 database 
is computed. 

The constraint languages presented here have been de- 
fined so that the reader can appreciate the scope of mod- 
cling possibilities for geospatial applications. The examples 
illustrating the RDF 1 framework throughout the paper are 
based on PCL. Language TCL will be useful in Section 8 
when some preliminary complexity results are given, and 
in Section 9 when RDF 1 is compared with other geospatial 
modeling frameworks. 

3.1 The Language TCL 

The language TCL ( Topological Constraint Language) al- 
lows us to represent topological properties of non-empty, reg- 
ular closed subsets of Q 2 (we will call these subsets regions 
for brevity). TCL is a first-order language with the follow- 
ing 6 binary predicate symbols: DC, EC, PO, EQ, TPP, and 
NTPP. An atomic formula of TCL is a formula of the form 
ri R r 2 where ri , r 2 are variables and R is one of the above 
predicates. We will often use the terminology C- constraints 
to refer to atomic formulae of an arbitrary constraint lan- 
guage C. For example, the following are TCL-constraints: 

n NTPP r 2 , r 2 PO r 3 , r 2 EQ r 3 

The intended structure for TCL, denoted by Mtcl, has 
the set of regions as its domain, and interprets each of the 
predicate symbols given above by the corresponding topo- 
logical relation of RCC-8 [23]. Note that relations NTPPi 
and TPPi of RCC-8 are not included in the vocabulary of 
TCL since they can be expressed by interchanging the argu- 
ments of NTPP and TPP. 

The language TCL allows us to capture the topology of re- 
gions of interest to an application but makes no commitment 
regarding other non-topological properties of these regions, 
e.g., shape. The language PCL considered below deals with 
polygonal shapes. 

3.2 The Language PCL 

The language PCL (Polygon Constraint Language) allows 
us to represent topological properties of polygons in Q 2 . 
PCL is a first-order language with the following 6 binary 
predicate symbols: DC, EC, PO, EQ, TPP, and NTPP. The 
constant symbols of PCL represent polygons in Q 2 . We will 
write these constants as conjunctions of linear constraints in 
quotes (half-space representation of the convex polygon) as 
we already did in the examples of Section 2. The terms and 
atomic formulae of PCL are defined as follows. Constants 
and variables are terms. An atomic formula of PCL (PCL- 
constraint) is a formula of the form t\ R t 2 where t\,ti are 
terms and R is one of the above predicates. For example, 
the following are PCL-constraints: 

ri NTPP r 2 , r 2 NTPP "x — y > A x < 1 A y > 0" 

The intended structure for PCL, denoted by Mpcl, has 



the set of non-empty, regular closed subsets of Q 2 as its do- 
main (we will call these subsets regions for brevity). Mpci 
interprets each constant symbol by the corresponding poly- 
gon in Q 2 and each of the predicate symbols by the corre- 
sponding topological relation of RCC-8 [23] . 

4. THE RDF' FRAMEWORK 

As in theoretical treatments of RDF [21], we assume the 
existence of pairwise-disjoint, countably infinite sets I, B, 
and L that contain IRIs, blank nodes, and literals respec- 
tively. We also assume the existence of a datatype map M [9] 
and distinguish a set of datatypes A from M for which e- 
literals are allowed. Finally, we assume the existence of a 
many-sorted first order constraint language C with the prop- 
erties discussed in Section 3. C is related to the datatype 
map M in the following way: 

• The set of sorts of C is the set of datatypes A of M. 

• The set of constants of C is the union of the lexical 
spaces of the datatypes in A. 

• Mc interprets every constant c of C with sort d by 
its corresponding value given by the lexical-to-value 
mapping of the datatype d in A. 

The set of constants of £ (equivalently: the set of literals 
of the datatypes in A) will be denoted by C. We also assume 
the existence of a countably infinite set of e-literals for each 
datatype in A and use U to denote the union of these sets. 
By convention, the identifiers of e-literals will start with an 
underscore, e.g., _R5. C and U are assumed to be disjoint 
from each other and from /, B, and L. The set of RDF 1 
terms, denoted by T, can now be defined as the union I U 
BULUCUU. 

In the rest of our examples we will assume that C is PCL, 
so C is the set of all polygons in Q 2 written in the linear 
constraint syntax of Section 3. 

We now define the basic concepts of RDF 1 : e-triples, con- 
ditional triples, conditional graphs, global constraints, and 
databases. Triples in RDF 1 (called e-triples) are as in RDF 
but now e-literals are also allowed in the object position. 
Combining an e-triple with a conjunction of /^-constraints, 
we get a conditional triple. Graphs in RDF 1 are conditional 
and consist of sets of conditional triples. Global constraints 
are simply Boolean combinations of /^-constraints. The com- 
bination of a conditional graph and a global constraint is 
called a database. 

Definition 4.1. An e-triple is an element of the set (Ju 
B) x I x T. If (s,p,o) is an e-triple, s will be called the 
subject, p the predicate, and o the object of the triple. A 
conditional triple is a pair (t, 9) where t is an e-triple and 
9 is a conjunction of C- constraints. If(t,9) is a conditional 
triple, 9 will be called the condition of the triple. 

Definition 4.2. A global constraint is a Boolean combi- 
nation of C- constraints. 

Definition 4.3. A conditional graph is a set of condi- 
tional triples. An RDF 1 database D is a pair D = (G, <ft) 
where G is a conditional graph and <j) a global constraint. 

In the rest of the paper, when we want to refer to standard 
RDF constructs we will write "RDF triple" and "RDF graph" 
so that no confusion with RDF 1 is possible. 



Example 4.4. The following pair is an RDF 1 database. 

( { ((hotspotl, type, Hotspot) , true), 
((firel, type, Fire), true), 
((hotspotl, correspondsTo, firel), true), 
((firel, occuredln, _R1) , true) }, 

_R1 NTPP "x>6Ax<23Ay>8Ay< 19" ) 

Example 4.5. The following pair is an RDF 1 database 
with a disjunctive global constraint. 

( { ((hotspotl, type, Hotspot), true), 
((firel, type, Fire), true), 
((hotspotl, correspondsTo, firel), true), 
((firel, occuredln, _R1) , true), 
( (f ire2 , occuredln, 

"x>6Ax<23Ay>8Ay< 19") , true) }, 

(_R1 NTPP "x>6Ax<23Ay>8Ay< 19" A 
_R1 NTPP "x > 10 A x < 21 A y > 12 A y < 17") V 

_R1 PO "x > 2 A x < 6 A y > 4 A y < 8" ) 

5. SEMANTICS OF RDF' 

The semantics of RDF 1 are inspired by [11]. An RDF 1 
database D = (G, 4>) corresponds to a set of possible RDF 
graphs each one representing a possible state of the real 
world. This set of possible graphs captures completely the 
semantics of an RDF 1 database. The global constraint <j) de- 
termines the number of possible RDF graphs corresponding 
to D; there is one RDF graph for each solution of <p obtained 
by considering the e-literals of <f> as variables and solving the 
constraint <j). 

Example 5.1. Let D = {G,<f>) be the RDF 1 database 
given in Example 4-4- Database D mentions a hotspot, which 
is located in a region that is inside but does not intersect with 
the boundary of rectangle P of Figure 1(b). The same knowl- 
edge can be represented by an (infinite) set of possible RDF 
graphs, one for each rectangle inside P. Two of these graphs 
are: 

G\ = { (hotspotl, type, Hotspot), (firel, type, Fire), 
(hotspotl, correspondsTo, firel) , 

(firel, occuredln, "x > 11 A x < 15 A y > 13 A y < 15")} 

G2 = { (hotspotl, type, Hotspot), (firel, type, Fire), 
(hotspotl, correspondsTo, firel) , 

(firel, occuredln, "x > 10 A x < 21 A y > 12 A y < 17")} 

In order to be able to go from RDF 1 databases to the 
equivalent set of possible RDF graphs, the notion of valua- 
tion is needed. Informally, a valuation maps an e-literal to 
a specific constant from C. 

Definition 5.2. A valuation v is a function from U to 
C assigning to each e-literal from U a constant from C. 

We denote by v(t) the application of valuation v to an e- 
triple t. v(t) is obtained from t by replacing any e-literal J 
appearing in t by v(J) and leaving all other terms the same. 
If 6 is a formula of C (e.g., the condition of a conditional 
triple or the global constraint of a database) then v(8) de- 
notes the application of v to formula 9. The expression v(9) 
is obtained from 9 by replacing all e-literals _l of 9 by v(J). 

Next, we give the definition of applying a valuation to a 
conditional graph. 



Definition 5.3. Let G be a conditional graph and v a 
valuation. Then v(G) denotes the RDF graph 

{v(t) | (t,6) G G and M c \= v(9)}. 

The set of valuations that satisfy the global constraint of 
an RDF 1 database determines the set of possible RDF graphs 
that correspond to it. This set of graphs is denoted using 
the function Rep as it is traditional in incomplete relational 
databases. 

Definition 5.4. Let D = (G, cj>) be an RDF 1 database. 
The set of RDF graphs corresponding to D is the following: 

Rep(D) = {H | there exists a valuation v 

such that M £ |= v(4>) and H D v(G)} 

In incomplete relational databases [11], Rep is a semantic 
function: it is used to map a table (a syntactic construct) 
to a set of relational instances (i.e., a set of possible words, 
a semantic construct). According to the well-known distinc- 
tion between model theoretic and proof theoretic approaches 
to relational databases, Rep and the approaches based on it 
[11, 5] belong to the model theoretic camp. However, the 
use of function Rep in the above definition is different. Rep 
takes an RDF 1 database (a syntactic construct) and maps 
it to a set of possible RDF graphs (a syntactic construct 
again). This set of possible graphs can then be mapped to 
a set of possible worlds using the well-known RDF model 
theory [9]. This is a deliberate choice in our work since 
we want to explore which well-known tools from incomplete 
relational databases carry over to the RDF framework. 

Notice that the definition of Rep above uses the contain- 
ment relation instead of equality. The reason for this is to 
capture the OWA that the RDF model makes. By using the 
containment relation, Rep(D) includes all graphs H contain- 
ing at least the triples of v(G). In this respect, we follow 
the approach of [3, Section 3], where the question of whether 
SPARQL is a good language for RDF is examined in the light 
of the fact that RDF adopts the OWA. To account for this, 
an RDF graph G is seen to correspond to a set of possible 
RDF graphs H such that G C H (in the sense of the OWA: 
all triples in G also hold in H). The above definition takes 
this concept of [3] to its rightful destination: the full treat- 
ment of incomplete information in RDF. As we have already 
noted in the introduction, the kinds of incomplete informa- 
tion we study here for RDF has not been studied in [3] ; only 
the issue of OWA has been explored there. 

The following notation will be useful below. 

Notation 1. Let Q be a set of RDF graphs and q a 
SPARQL query. The expression f] Q will denote the set 
C\ G( zg G. The expression \q\g, which extends the notation 
of [21] to the case of sets of RDF graphs, will denote the 
element-wise evaluation of q over Q , that is, 

Ms = {Ma \GeG}. 

Given the semantics of an RDF 1 database as a set of pos- 
sible RDF graphs, what is an appropriate definition for the 
answer to a certainty query? This is captured by the fol- 
lowing definition of certain answer which extends the corre- 
sponding definition of Section 3.1 of [3] by applying it to a 
more general incomplete information setting. 

Definition 5.5. Let q be a query and Q a set of RDF 
graphs. The certain answer to q over Q is the set HMs- 



Example 5.6. Let us consider the following query over 
the database of Example 4-4 : "Find all fires that have oc- 
curred in a region which is a non-tangential proper part of 
the rectangle Q2 of Figure 1(b)". The certain answer to this 
query is the set of mappings {{?F — > firel}}. 

6. EVALUATING SPARQL ON RDF' 
DATABASES 

Let us now discuss how to evaluate SPARQL queries 
on RDF 1 databases. We will use the algebraic syntax of 
SPARQL presented in [14]. We will consider only the mono- 
tone graph pattern fragment of SPARQL which uses only 
the AND, UNION, and FILTER operators [3]. We will deal 
with both SELECT and CONSTRUCT query forms. Due 
to the presence of e-literals, query evaluation now becomes 
more complicated and is similar to query evaluation for con- 
ditional tables [11, 5]. The exact details will be given later 
in this section. 

We use set semantics for query evaluation by extending 
the SPARQL query evaluation approach of [21] . Blank nodes 
are interpreted as in SPARQL, i.e., as constants different 
from each other. Notice that this is not the same as the 
semantics of blank nodes in RDF model theory [9] where 
they are treated as existentially quantified variables. 

We assume the existence of the following disjoint sets of 
variables: (i) the set of normal query variables V n that range 
over IRIs, blank nodes, or RDF literals, and (ii) the set of 
special query variables V s that range over literals from the 
set C or e-literals from the set U. We use V to denote the 
set of all variables V n U V s . Set V is assumed to be disjoint 
from the set of terms T we defined in Section 4. 

We first define the concept of e-mapping ("e" from the 
word "existential") which extends the concept of mapping 
of [14] with the ability to have an e-literal as value of a 
special query variable. 

Definition 6.1. An e-mapping v is a partial function v : 
V -> T such thatv(x) G IUBuL if x G V„ andv(x) G CUU 
ifx G V a . 

Example 6.2. The following are e-mappings. 

Ul = { ?F -> firel, ?S -> "x > 1 A x < 2 A y > 1 A y < 2" } 
u 2 = { ?F-> firel, ?S -> _R1 } 
^ 3 = { ?F -> firel, ?S _R2 } 
U4, = { ?F -> firel } 

The notions of domain and restriction of an e-mapping as 
well as the notion of compatibility of two e-mappings are 
defined as for mappings in the obvious way [21] (we also use 
the same notation for them). 

We now extend the concept of e-mapping and define con- 
ditional mappings, i.e., mappings that are equipped with a 
condition which constrains e-literals that appear in the e- 
mapping. 

Definition 6.3. A conditional mapping fi is a pair (v, 6) 
where v is an e-mapping and 9 is a conjunction of C- 
constraints. 

Example 6.4. The following are conditional mappings. 

U1 = ({?F -> firel, ?S-> "x>lAx<2Ay>lAy< 2"}, true) 
a 2 = ({?F -> firel, ?S -> _R1}, 

_R1 NTPP "x>0Ax<10Ay>0Ay< 10") 



u A = ({?F -> firel, ?S -> _R1}, 

(_R1 NTPP _R2) A 

(_R2 DC "x>OAx<lAy>OAy< 1") ) 
u 4 = ({?F -> firel, ?S -> _R1}, true) 

Notice that conditional mappings with constraint true, 
such as /14 above, are logically equivalent to e-mappings. 

The notions of domain and restriction for a conditional 
mapping are now defined as follows. 

Definition 6.5. The domain of a conditional mapping 
/i = (1/, 9), denoted by dom(p), is the domain of v, i.e., the 
subset of V where the partial function v is defined. 

Definition 6.6. Let fi = (v, 9) be a conditional mapping 
with domain S and W C S. The restriction of the mapping 
fi to W, denoted by fi\w, is the mapping (i/\ w ,9) where v\ w 
is the restriction of mapping v to W . 

We now define the basic notion of triple pattern. 

Definition 6.7. A triple pattern is an element of the set 
(I U V) x (/ U V) x (I U L U C U U U V). 

Note that we do not allow blank nodes to appear in a triple 
pattern as in standard SPARQL since such blank nodes can 
equivalently be substituted by new query variables. 

If p is a triple pattern, var(p) denotes the variables ap- 
pearing in p. A conditional mapping can be applied to a 
triple pattern. Let /1 = (v, 9) be a conditional mapping and 
p a triple pattern such that var(p) C dom(fj). We denote by 
n(p) the triple obtained from p by replacing each variable 
x G var(p) by i/(x). 

We now introduce the notion of compatible conditional 
mappings as in [21]. 

Definition 6.8. Two conditional mappings pi = (1/1,61) 
and H2 = (1/2,92) are compatible if the e-mappings v\ and 
V2 are compatible, i.e., for all x G dom(/xi) PI dom(fi2), we 
have i/i(x) — 1/2(1) . 

Example 6.9. Mappings and [12 from Example 6.4 are 
not compatible, while mappings \X2 and ^13 are. 

To take into account e-literals, we also need to define an- 
other notion of compatibility of two conditional mappings. 

Definition 6.10. Two conditional mappings fii — 
(1/1, 9i) and fi2 = (^2,^2) are possibly compatible if for all 
x G dorn(fii) n dom(fi2), we have i/i(x) = V2(x) or at least 
one of v\(x),V2(x) where x G V s is an e-literal from U. 

Example 6.11. Conditional mappings fii, /i 2 , and 113 
from Example 6.4 are pairwise possibly compatible. 

If two conditional mappings are possibly compatible, then 
we can define their join as follows. 

Definition 6.12. Let /ui = (1/1, 9i) and [i2 = (1/2,82) be 
possibly compatible conditional mappings. The join /U1IX//2 
is a new conditional mapping (1/3,63) where: 

i. Ps(x) = i/i(x) — v 2 (x) for each x G dom(fxi) D dom(fj,2) 
such that i/i(x) = i/2(x). 

ii. v:i(x) = Pi(x) for each x G dom(fii)Pidom(fi2) such that 
i/i(x) is an e-literal and i/2(x) is a literal from C. 



Hi. v:i(x) = V2(x) for each x G dom(fj,i)Pidom(fi2) such that 
i/2(x) is an e-literal and vi(x) is a literal from C. 

iv. v:i(x) = v\ (x) forx G dom(\x\)V\dom(\X2) such that both 
v\(x) and 1/2 (x) are e-literals. 

v. i/\i(x) = ^1(2;) for x G dom(ni) \ dom(iJL2). 

vi. i/\i(x) = i/2(x) for x G dom(n2) \ dom(/j,i). 

vii. 63 is 81 A 62 A £1 A £2 A £3 where: 

- £1 is /\ 4 jVi EQ J,i, where the jVi 's and _U 's are all 
the pairs of e-literals i/i(x) and i/2(x) from Case 
(iv) above. If there are no such pairs, then £1 is 
true. 

- £2 is /\ i -Wi EQ li where the _Wi 's and U 's are all 
the pairs of e-literals v\ (x) and literals i/2(x) from 
the set C from Case (ii) above. If there are no such 
pairs, then £2 is true. 

- £3 is /\i - w i EQ h where the _Wi 's and k 's are all 
the pairs of e-literals 1/2 (x) and literals v\ (x) from 
the set C from Case (Hi) above. If there are no 
such pairs, then £3 is true. 

The predicate EQ used in the above definition is the equal- 
ity predicate of C. 

Example 6.13. // fii and /12 are the conditional map- 
pings of Example 6.4, then: 

u- L Mu2 = ({?F -> firel, ?S _R1}, true A 

_R1 EQ "x > 1 A x < 2 A y > 1 A y < 2" A 
_R1 NTPP "x>0Ax<10Ay>0Ay< 10" ) 

For two sets of conditional mappings Qi and fi2, the op- 
eration of join is now defined as follows. 

f2iMf2 2 = {mi^M2 I Mi £ ^i,M2 G O.2 are possibly 

compatible conditional mappings} 

The reader is invited to compare this definition with the 
definition of join of mappings for RDF [21]. The new thing 
with conditional mappings is that due to the presence of e- 
literals, we have to anticipate the possibility that two map- 
pings from 0.1 and O2 become compatible when e-literals are 
substituted by constants from C. We anticipate this case by 
adding relevant constraints to the condition of a mapping. 

The operation of union is defined as in the standard case: 

fii U Q2 = {// I fJ. € 0,i or fj, G O2} 
We now define the operator of difference: 

0!\0 2 = 

{Hi G Oi I for all \i2 G 02,/ii and /i 2 are not compatible} U 
{(v, 8')\p,= (v, 9) G 0\ and ^1 = (v\,9 x ), . . . , ^„ = (i/„, 8„) 
G SI2 such that in, fj, are possibly compatible for all 
1 < i < n, and for every other /ij G O2 different than 
the fii's, [ij and n are not compatible. In this case, 9' is 

e/\^DVH^) EQ Hi(x)))^ 
for every x G dom(fi) n dom(fii) n V s and 1 < i < n} 



The reader is invited to compare this definition with the 
definition of difference in [21]. The new thing in RDF 1 is 
that we have to anticipate the possibility that a mapping [i 
from Oi is not compatible with all the mappings of (i.e., 
it should be included in the difference) due to the presence 
of e-literals in it given some constraints. These constraints 
are added to the condition of fi. 

Example 6.14. Let Qi = {//n, ^12}, ^2 = {^21, ^22} be 
sets of conditional mappings such that 

Ull = ({?F -» firel,?S -» _R1}, 

_R1 NTPP "x>0Ax<10Ay>0Ay< 10") 

Ml2 = ({?F -» fircl, ?S ^ "x>lAx<2Ay>lAy< 2"}, 
_R1 PO "x>0Ax<10Ay>0Ay< 10") 

U21 = ({?F -> firc2}, true) 

M22 = ({?F -> fircl, ?S ^ "x>lAx<2Ay>lAy< 2"},truc) 

Then, S7i \ = {/i} where /i has been constructed from 
/in G ^l and is the following mapping: 

u = ({?F -> firel,?S -> _R1}, 

(_R1 NTPP "x>0Ax<10Ay>0Ay< 10") A 
^(_R1 EQ "x>lAx<2Ay>lAy< 2") ) 

The operation of left-outer join is defined as in the stan- 
dard case: 

fiiZKIfia = (fiiNfi 2 ) U (fii \ Q 2 ) 

It has been noted in [21] that the OPT operator of 
SPARQL (the counterpart of the left-outer join algebraic 
operator) can be used to express difference in SPARQL. For 
data models that make the OWA, such an operator is unnat- 
ural since negative information cannot be expressed. How- 
ever, we deliberately include operator OPT because if it is 
combined with operators AND and FILTER under certain 
syntactic restrictions, it turns out that the resulting graph 
patterns cannot express a difference operator anymore [3]. 
In particular, the class of graph patterns produced by this 
syntactic restriction are known as well-designed graph pat- 
terns. Well-designed graph patterns are discussed in more 
depth in Section 7 where representation systems for RDF 1 
are investigated. 

We can now define the result of evaluating a graph pat- 
tern over an RDF 1 database (the definition of graph patterns 
is omitted). Given the previous operations on sets of map- 
pings, graph pattern evaluation in RDF 1 can now be defined 
exactly as in standard SPARQL for RDF graphs [21] except 
for the case of evaluating a triple pattern. 

Definition 6.15. Let D = (G,<f>) be an RDF 1 database. 
Evaluating a graph pattern P over database D is denoted by 
[P] d and is defined recursively as follows: 

1. If P is the triple pattern (s,p,o) then we have two 
cases. If o is a literal from the set C then 

\P\d ={a« = {v,0) I dom(n) = var(P) and 

(jt{P),0)eG} u 

{(j, = (u, (J EQ o) A 6) I dom(fi) = var(P), 
((v(s), v(p), J), 6)eG and J G U] 

else 

\P\d ={m = (M) I dom{v) = var(P), {n{P),0) G G} 



2. IfP is Pi AND P 2 then [P] D = [Pi] D N[P 2 ] D . 

3. IfP is Pi UNION P 2 then [P] D = [Pijc U IP 2 ]d- 

4. IfP is A OPT P 2 then \P\ D = [PiJoIM^Jc. 

In the first item of the above definition the "else" part is to 
accommodate the case in which evaluation can be done as in 
standard SPARQL. This is the case in which the object part 
of the triple pattern is not a literal from C. The "if" part 
accommodates the case in which the triple pattern involves 
a literal o from the set C. Here, there are two alternatives: 
the graph contains a conditional triple matching with every 
component of the triple pattern (i.e., a triple which has o in 
the object position) or it contains a conditional triple with 
an e-literal J, from U in the object position. We catch a 
possible match for the second case by adding in the condition 
of the mapping the constraint that restricts the value of e- 
literal _l to be equal to the literal o of the triple pattern 
(i.e., the constraint J EQ o). In all cases of the first item 
of the above definition, since the triples in the database are 
conditional, their conditions become parts of the conditions 
of the mappings in the answer. 

Example 6.16. Let us first give two examples for the 
evaluation of triple patterns over the database D of Exam- 
ple 4-5. 

[(?F, occuredln, "x>lAx<2Ay>lAy< 2")] D = { } U 
{({?F -> fircl}, _R1 EQ "x > 1 A x < 2 A y > 1 A y < 2")} 

[(?F, occuredln, "x>6Ax<23Ay>8Ay< 19")]d = 
{({?F -> firc2}, true)} U {({?F -> fircl}, 

_R1 EQ "x > 6 A x < 23 A y > 8 A y < 19")} 

These examples correspond to the "if "part of the first item of 
Definition 6.15 in which the triple pattern involves a literal 
from the set C . 

Example 6.17. Let us now give an example of an eval- 
uation of graph pattern Pi AND P 2 over the database 
D of Example 4-4, where Pi,P 2 are the triple patterns 
(?F, type, Fire) and (?F, occuredln, ?R) respectively. Ac- 
cording to the above definition, we have: 

[(Pi AND P 2 )] D = [PiJd m [P 2 1d = 
[(?F, type, Fire)] D X [(?F, occuredln, ?R)] D = 
{({?F -> fircl}, true)} X {({?F -> firel,?R -> _Rl},truc)} = 
{({?F -> fircl, ?R -> _R1}, true)} 

The evaluation of both triple patterns Pi , P2 corresponds to 
the "else" part of the first item of Definition 6.15. In this 
case evaluation is done as in standard SPARQL, but here 
conditions of matched triples have to be transferred to the 
respective answer, i.e., we have conditional mappings. 

Let us now consider the operator FILTER. It is natural 
to allow FILTER graph patterns to contain conjunctions of 
^-constraints as expressions that constrain query variables, 
e.g., constraints like ?X NTPP ?Y or ?X EQ "x > 1 A x < 
2 Ay > lAy<2" when C is PCL as in our examples. 

The evaluation of FILTER graph patterns involving C- 
constraints can now be defined as follows. Notice that the 
evaluation does not check for satisfaction of the constraints 
as in standard SPARQL [21], but simply imposes these con- 
straints on the mappings that are in the answer of the graph 
pattern involved. 



Definition 6.18. Given an RDF 1 database D = (G,(f>), 
a graph pattern P and a conjunction of C- constraints R, we 
have: 

IP FILTER R\ D = {// = (v, 0')\n = {v, 9) € [PJd 

and 9' is 9 A v(R) } 

In the above definition, v{R) denotes the application of 
e-mapping v to condition R, i.e., the conjunction of C- 
constraints obtained from R when each variable x of R which 
also belongs to domiy) is substituted by v(x). 

The extension of FILTER to the case that R is a Boolean 
combination of /^-constraints is now easy to define and is 
omitted. Similarly, the extension of FILTER to the case 
that R contains also other built-in conditions of standard 
SPARQL [21] is easy to define and is omitted as well. 

The following example illustrates the definition and shows 
that the purpose of constraint u(R) is to deal in a uniform 
way with the case that the object of a triple is a constant 
from C or an e-literal from U. Notice that v(R) is required 
because mappings in our case can contain variables with e- 
literals as values, thus we might not be able to deduce their 
satisfaction yet. Thus, evaluation of FILTERs is "lazy". In 
an implementation, one can also simplify constraints at this 
stage; such issues are beyond the scope of this paper. 

Example 6.19. Based on the evaluation of the graph pat- 
tern of Example 6.17, the evaluation of the graph pattern 
((Pi AND P 2 ) FILTER R), where R is the PCL- constraint 
(?R NTPP "x > 10 A x < 21 A y > 12 A y < 17"), is the 
following: 

[(Pi AND P 2 ) FILTER R] D = 
[((?F, type, Fire) AND (?F, occuredln, ?R)) FILTER 
(?R NTPP "x > 10 A x < 21 A y > 12 A y < 17")]d = 
{({?F-> firel, ?R->_R1}, 

_R1 NTPP "x > 10 A x < 21 A y > 12 A y < 17")} 

The next definition defines the concept of a SELECT 
query [21]. 

Definition 6.20. A SELECT query is a pair (W,P) 
where W is a set of variables from the set V and P is a 
graph pattern. 

Example 6.21. Let us consider the following query over 
the database of Example 4-4 : "Find all fires that have oc- 
curred in a region which is a non-tangential proper part of 
rectangle Qi of Figure 1(b)". This query can be expressed as 
follows: 

({?F},(?F, type, Fire) AND (?F, occuredln, ?R) 

FILTER (?R NTPP "x > 10 A x < 21 A y > 12 A y < 17")) 

The next definition defines the notion of answer to a SE- 
LECT query. In contrast to SELECT queries over RDF 
graphs, SELECT queries over RDF 1 databases have answers 
that consist of conditional mappings so they might be harder 
to understand. 

Definition 6.22. Let q = (W, P) be a SELECT query. 
The answer to q over an RDF 1 database D = (G, 4>) (in 
symbols \o\d) is the set of conditional mappings {n\w I M £ 
IP}d}. 

The conditional mappings of the answer to a query might 
contain e-literals. These literals are constrained by the 



global constraint <f>, therefore (j> can be understood to be 
implicitly included in the answer (this can also be done for- 
mally by considering answers to be pairs). 

Example 6.23. The answer to the query from Example 
6.21 can be obtained from the evaluation of the respective 
graph pattern from Example 6.19. The answer is a set that 
contains only the following mapping: 

({?F -> fircl}, _R1 NTPP "x > 10 A x < 21 A y > 12 A y < 17" ) 

This answer is conditional. Because the information in the 
database of Example 4-4 * s indefinite (the exact geometry of 
_R1 is not known), we cannot say for sure whether fir el 
satisfies the requirements of the query. These requirements 
are satisfied under the condition given in the above mapping. 

Let us now introduce the notion of a template and define 
the CONSTRUCT query form. 

Definition 6.24. A template E is a finite subset of the 
set (TU V) x (/U V) x (TU V). 

Thus, the elements of a template are like triple patterns 
but blank nodes are also allowed in the subject and object 
positions. We denote by var(E) and blank(E) the set of 
variables and set of blank nodes appearing in the elements 
of E respectively. 

Definition 6.25. A CONSTRUCT query is a pair 
(E, P) where E is a template and P a graph pattern. 

Example 6.26. Let us consider the query of Exam- 
ple 6.21. A new version of this query using the CON- 
STRUCT query form is: 

({(?F, type, Fire)}, (?F, type, Fire) AND (?F, occuredln, ?R) 

FILTER (?R NTPP "x > 10 A x < 21 A y > 12 A y < 17")) 

Next we define what it means for a conditional mapping 
to be applied to a template. 

Definition 6.27. Let n = (v, 9) be a conditional mapping 
and E a template. We denote by /i(P) the application of 
conditional mapping fi to template E. /x(P) is obtained from 
E by replacing in E every variable x of var(E) C\dom(n) by 
v(x). 

Templates are used to specify the graph that results from 
the evaluation of a CONSTRUCT query. 

Example 6.28. Let us consider the template E = 
{(IF, type, 1Z), (IF, occuredln, 7S)} and mapping fi4 from 
Example 6.4- The result of applying pi 4 to E is the following 
set: 

{(firel, type, ?Z), (firel, occuredln, _R1)} 

Notice that Definition 6.27 does not require a conditional 
mapping to share any variables with the template to which 
it is applied. As a consequence, the first element of Ha(E) 
is not a valid e-triple, i.e., it is not an element of the set 
(I U B) x I x T. Such a triple is dropped from the answer to 
a CONSTRUCT query (see Definition 6.30 below). 

Next we define the concept of answer to a CONSTRUCT 
query. The definition extends the specification of standard 
SPARQL [22] to account for the RDF 1 framework and fol- 
lows the formal approach of [20]. Before we give the defini- 
tion, we need to introduce the notion of renaming function. 



Definition 6.29. Let E be a template, P a graph pattern, 
and D = (G,<f>) an RDF 1 database. The set {/ M | n G 
JP]d} is a set of renaming functions for E and \P\d if 
the following properties are satisfied: 1 ) the domain of every 
function f^ is blank(E) and its range is a subset of (B \ 
blank(G)) , 2) every function / M is one-to-one, and 3) for 
every pair of distinct mappings G ]d, /^i , fn 2 have 

disjoint ranges. 

The application of a renaming function / M to a template 
E is denoted by and results in renaming the blank 

nodes of E according to / M . 

Definition 6.30. Let q = (E,P) be a CONSTRUCT 
query, D — (G, 4>) an RDF 1 database and F = {/ M | /i G 
|[P]ij} a fixed set of renaming functions. The answer to q 
over D (in symbols |[g]z>) is the RDF 1 database D' — (G 1 , <f>) 
where 

G'= U {(t,e)\te(KU(E))n((PJB)xixT))}. 

M =(^,e)e[Pl D 

In the above definition, renaming functions are used to 
ensure that brand new blank nodes are created for each con- 
ditional mapping fi. The intersection with (J U B) x I x T 
makes sure that no illegal triples are returned as answers 
(see Example 6.28 above). 

Example 6.31. The answer to the CONSTRUCT query 
from Example 6.26 can be obtained from the evaluation of the 
respective graph pattern from Example 6.19. The answer is 
the following RDF 1 database: 

( { ((firel, type, Fire), 

_R1 NTPP "x > 10 A x < 21 A y > 12 A y < 17") }, 

_R1 NTPP "x>6Ax<23Ay>8Ay< 19" ) 

7. REPRESENTATION SYSTEMS FOR 
RDF' 

Let us now recall the semantics of RDF 1 as given by Rep. 
Rep(D) is the set of possible RDF graphs corresponding to 
an RDF 1 database D. Clearly, if we were to evaluate a query 
q over D, we could use the semantics of RDF 1 and evaluate 
q over any RDF graph of Rep(D) as follows: 

= {Ma I G G Rep(D)} 

However, this is not the best answer we wish to have in 
terms of representation; we queried an RDF 1 database and 
got an answer which is a set of RDF graphs. Any well- 
defined query language should have the closure property, 
i.e., the output (answer) should be of the same type as the 
input. Ideally, we would like to have an RDF 1 database 
as the output. Thus, we are interested in finding an RDF 1 
database [[(/Jd representing the answer |[g]jj e p(c). This re- 
quirement is translated to the following formula: 

&P(Hfl) = k}Re P (D) (1) 

Formula (1) allows us to compute the answer to any query 
over an RDF 1 database in a consistent way with respect to 
the semantics of RDF 1 without having the need to apply the 
query on all possible RDF graphs. can be computed 

using the algebra of Section 6 above. But can the algebra of 
Section 6 compute always such a database [gjo representing 



Mi?.ep(-D)? I n other words, can we prove (1) for all SPARQL 
queries considered in Section 6? The answer is no in gen- 
eral. The following example modelled after [6] illustrates 
this negative fact. 

Example 7.1. Consider the RDF 1 database D = {G,(f>), 
where G = {((s,p, 6), true)} and <f> — true, i.e., D contains 
the single triple (s,p,o) where s,p,o G /. Consider now a 
CONSTRUCT query q over D that selects all triples having 
s as the subject. The algebraic version of query q would be 
({(s, ?p, To)}, (s, ?p, To)) and evaluated as Iq}d using Defini- 
tion 6.30. Then, the triple (s,p, o) and nothing else is in the 
resulting database [gjzj. However, equation (1) is not satis- 
fied, since for instance (c, d, e) occurs in some g G Rep(\q\o) 
according to the definition of Rep, whereas (c, d,e) ^ g for 

all g G Mfiep(D)- 

Note that the above counterexample to (1) exploits only 
the fact that RDF makes the OWA. In other words, the 
counterexample would hold for any approach to incomlete 
information in RDF which respects the OWA. Thus, unless 
the CWA is adopted, which we do not want to do since we 
are in the realm of RDF, condition (1) has to be relaxed . 

In the rest of this section we follow the literature of in- 
complete information [11, 5] and show how (1) can be weak- 
ened. The key concept for achieving this is the concept of 
certain answer we defined earlier. Given a fixed fragment of 
SPARQL Q, two RDF 1 databases cannot be distinguished 
by Q if they give the same certain answer to every query in 
Q. The next definition formalizes this fact using the con- 
cept of Q-equivalence. Originally this concept was defined 
for incomplete relational databases in [11]. 

Definition 7.2. Let Q be a fragment of SPARQL, andQ, 
H two sets of RDF graphs. Q and H are called Q-equivalent 
(denoted by Q =q H) if they give the same certain answer 
to every query in the language, that is, HM S = flWw f or 
all q G Q. 

We can now define the notion of a representation sys- 
tem which gives a formal characterization of the correctness 
of computing the answer to a query directly on an RDF 1 
database instead of using the set of possible graphs given 
by Rep. The definition of representation system (originally 
defined in [11] for incomplete relational databases) corre- 
sponds to the notion of weak query system defined in the 
same context by [5]. 

Definition 7.3. Let V be the set of all RDF 1 databases, 
Q the set of all RDF graphs, Rep : T> — > Q a function deter- 
mining the set of possible RDF graphs corresponding to an 
RDF 1 database, and Q a fragment of SPARQL. The triple 
{T>, Rep, Q) is a representation system if for all D G 2? and 
all q G Q, there exists an RDF 1 database [[qJd G T> such that 
the following condition is satisfied: 

Rep(\q\ D ) = Q IqjRep(D) 

The next step towards the development of a representa- 
tion system for RDF 1 and SPARQL is to introduce various 
fragments of SPARQL that we will consider and define the 
notions of monotonicity and coinitiality as is done in [11]. 

1 If the CWA is adopted, we can prove (1) using similar tech- 
niques to the ones that enable us to prove Theorem 7.14 
below. 



As in Section 6, our only addition to standard SPARQL 
is the extension of FILTERs with another kind of condi- 
tions that are constraints of C. We also consider the frag- 
ment of SPARQL graph patterns known as well-designed. 
Well-designed graph patterns form a practical fragment of 
SPARQL graph patterns that include the OPT operator and 
it has been showed in [21, 3] that that they have nice proper- 
ties, such as lower combined complexity than in the general 
case, a normal form which is useful for optimization, and 
they are also weakly monotone. Thus, it is worth studying 
them in the context of RDF 1 . Section A of the Appendix 
contains formal definitions and relevant background results 
for well-designed graph patterns. 

Notation 2. We denote by Q^ (resp. Q%) the set of 
all CONSTRUCT (resp. SELECT; queries consisting of 
triple patterns, and graph pattern expressions from class 
T. We also denote by Qwd (resp. Q W d) set °f a ^ 
CONSTRUCT (resp. SELECT; queries consisting of well- 
designed graph patterns. Last, we denote by Q^ all CON- 
STRUCT queries without blank nodes in their templates. 

The following definition introduces the concept of mono- 
tone fragments of SPARQL applied to RDF graphs. Then, 
Proposition 7.5 give us some fragments of SPARQL that are 
monotone. 

Definition 7.4. A fragment Q of SPARQL is monotone 
if for every q € Q and RDF graphs G and H such that 
GCH,it is [ ? ] G C Mb. 

Proposition 7.5. The following results hold with respect 
to the monotonicity of SPARQL: a) Language Qauf * s 
monotone, b) The presence of OPT or CONSTRUCT 
makes a fragment of SPARQL not monotone, c) Language 
Qauf * s monotone, d) Language Qwd is monotone. 

Parts a) — c) of the above proposition are trivial extensions 
of relevant results in [3]. However, part d) is an interesting 
result showing that the weak monotonicity property of well- 
designed graph patterns suffices to get a monotone fragment 
of SPARQL containing the OPT operator, i.e., the class of 
CONSTRUCT queries with well-designed graph patterns 
but without blank nodes in their templates. This property 
is not true for SELECT queries, and in this respect CON- 
STRUCT queries deserve closer attention. 

Monotonicity is a sufficient property for establishing our 
results about representation systems. Thus, in the following, 
we focus on the monotone query languages Qauf an d Qwd- 

Definition 7.6. Let Q and H be sets of RDF graphs. We 
say that Q and H are coinitial, denoted by Q « H, if for any 
G G Q there exists H G H such that H C G, and for any 
H G H there exists G G Q such that GCil. 

Example 7.7. The following sets are coinitial. 

Q = {{(a, b, c), (a, e, d), (a, /, g)}, {(a, b, c), (a, e, d)}, {(a, b, c)}} 
U = {{( a ,b,c),(a,e,d)},{{a,b,c)}} 

A direct consequence of the definition of coinitial sets is 
that they have the same greatest lower-bound elements with 
respect to the subset relation. In the above example, the 
greatest lower bound is f] Q = f] H = {(a, b, c)}. 



Proposition 7.8. Let Q be a monotone fragment of 
SPARQL and Q and H sets of RDF graphs. If Q « H then, 
for any q G Q, it holds that \q\g w \q\u- 

Lemma 7.9. Let Q and H be sets of RDF graphs. If Q 
and H are coinitial then Q = n c> H. 

^AUF 

We will now present our main theorem which character- 
izes the evaluation of monotone Qauf an d Qwd queries 
(Theorem 7.14). Before we do this, we need a few defini- 
tions and preliminary results. The first definition allows us 
to apply a valuation to a conditional mapping. By applying 
a valuation to a conditional mapping, we get an ordinary 
mapping like in the case of RDF simply by disregarding the 
constraint that results since it is equivalent to true. 

Definition 7.10. Let v : U — > C be a valuation and fi = 
(y, 9) a conditional mapping such that M/; |= v(6). Then 
v{n) denotes the mapping that results from substituting in 
e-mapping v the constant v(J,) for each e-literal jL. 

In a similar way, we can extend a valuation v to a set of 
mappings Q as follows. 

Definition 7.11. Let Q be a set of conditional mappings 
and v : U — > C a valuation. Then 

v(Sl) = {v(n) \n = {v,e)€Q. and M £ |= v{9)}. 

The next definition allows us to apply a valuation to an 
RDF 1 database. 

Definition 7.12. Let v : U ->• C be a valuation and D = 
(G, 4>) an RDF 1 database such that M c \= v{4>). Then v(D) 
denotes the RDF graph v(G). 

Proposition 7.13. Let D = (G,4>) be an RDF 1 database, 
q a query from a monotone fragment Q of SPARQL, and v 
a valuation such that Mc \= v(<f>). Then, s([(j] fl ) = 
implies Rep{\q\ D ) « lq\ R e V (D)- 

We are now ready to prove our main result. 

Theorem 7.14. The triples {V, Rep, Qauf) an d 
(V, Rep, Qwd) are representation systems. 

Since SELECT queries in SPARQL take as input an RDF 
graph but return a set of mappings (i.e., we do not have 
closure) , it is not clear how to include them in the developed 
concept of a representation system (but see the discussion 
about SELECT in Section 8 below). 

8. CERTAIN ANSWER COMPUTATION 

This section studies how the certain answer to a SPARQL 
query q over an RDF 1 database D can be computed, i.e., 
how to compute DMflepCo)- Having Theorem 7.14, it is 
easy to compute the certain answer to a query in the frag- 
ment of SPARQL Q C auf or Qwd- Since (V, Rep, Qauf) 
and (V,Rep, Qwd) are representation systems, we can ap- 
ply Definition 7.2 for the identity query to get f|I?]fle P (D) = 
f~) -RepCMo) f° r an q a nd D. Thus, we can equivalently 
compute f]Rep(lq]D) where Iqja can be computed using 
the algebra of Section 6. 

Before presenting the algorithm for certain answer com- 
putation, we need to introduce some auxiliary constructs 
similar to the ones defined in [11, 5] in the case of incom- 
plete relational databases. 



Definition 8.1. Let D = (G,<j>) be an RDF 1 database. 
The EQ-completed form of D is the RDF 1 database D ECi = 
{G EQ ,<f>) where G EQ is the same as G except that all e- 
literals J £ U appearing in G have been replaced in G EQ 
by the constant c € C such that <f> \— J EQ c (if such a 
constant exists). 

In other words, in the EQ-completed form of an RDF 1 
database D, all e-literals that are entailed by the global con- 
straint to be equal to a constant from C are substituted by 
that constant in all the triples in which they appear. 

Definition 8.2. Let D = (G, <j>) be an RDF 1 database. 
The normalized form of D is the RDF 1 database D* — 
(G*,(j>) where G* is the set 

{(t,6) | (t,9i) £ G for alli = l...n, and 8 is \J 9i}. 

i 

Given the above definition, the normalized form of an 
RDF 1 database D is one that consists of the same global 
constraint and a graph in which conditional triples with the 
same triple part have been joined into a single conditional 
triple with a condition which is the disjunction of the con- 
ditions of the original triples. Notice that these new con- 
ditional triples do not follow Definition 4.1 which assumes 
conditions to be conjunctions of /^-constraints. We will allow 
this deviation from Definition 4.1 in this section. 

Lemma 8.3. Let D — (G, <f>) be an RDF 1 database. Then: 

f)Rep(D)^f)Rep((D EQ r) 

Having Lemma 8.3, it is easy to give an algorithm that 
computes the certain answer to a query. 

Theorem 8.4. Let D = (G, cj>) be an RDF 1 database and 

q a query from Qauf or Qwd ■ The certain answer of q 
over D can be computed as follows: i) compute ac- 
cording to Section 6 and let D q = (G q ,(p) be the resulting 
RDF 1 database, ii) compute the RDF 1 database (H q ,4>) — 
((D 9 ) EQ )*, and Hi) return the following set of RDF triples: 

{(s,p,o) | ((s,p,o),9) £ H q such that 4> \= 9 and o £ U} 

Let us now present a preliminary analysis of the data com- 
plexity of computing the certain answer to a CONSTRUCT 
query over an RDF 1 database when C is a constraint lan- 
guage. Following [5] , we first define the corresponding deci- 
sion problem. 

Definition 8.5. Let q be a CONSTRUCT query. The 
certainty problem for query q, RDF graph H , and RDF 1 
database D, is to decide whether H C DMflepfD)- We de- 
note this problem by CERTc{q, H, D). 

The next theorem shows how one can transform the cer- 
tainty problem we defined above to the problem of deciding 
whether ip £ Th(Mc) for an appropriate sentence tp of C. 

Theorem 8.6. Let D — (G, <f)) be an RDF 1 database, q a 
query from Qauf or Qwd> an d H an RDF graph. Then, 
CERTc{q, H, D) is equivalent to deciding whether the fol- 
lowing formula is true in ~Mc ■' 

/\(VJ)«J)De((,,,D,J)) (2) 

tEH 

In the above formula: 



- _1 is the vector of all e-literals in the database D. 

- ®(t, q, D, _1) is a disjunction #iV ■ ■ -\/9k that is constructed 

as follows. Let |g]zj = (G',4>). Q(t,q,D,A) has a 
disjunct 8i for each conditional triple (t\, 9[) £ G' such 
that t and t\ have the same subject and predicate. 9i 
is: 

- 9'i if t and t'i have the same object as well. 

- 6>;A(J EQ o) if the object of t is o £ C and the object 

of t^ is J £ U. 

If t does not agree in the subject and predicate position 
with some t' it then Q(t,q, D,_\) is taken to be false. 

We can also prove a theorem like the above for SELECT 
queries by defining the relevant decision problem and devel- 
oping appropriate versions of the relevant results of Section 7 
that are needed. This involves first modifying Definition 7.2 
so that rl and Q are sets of sets of mappings and q is a 
SELECT query form (we call this SELECT-equivalence). 
Then, the condition of Definition 7.3, modified so that Q- 
equivalence is substituted by SELECT-equivalence, can be 
proved using essentially the same techniques as the ones used 
to prove Theorem B.l. 

The above theorem gives us immediately some easy upper 
bounds on the data complexity of the certainty problem in 
the case of RDF 1 with C equal to TCL or PCL. The satisfia- 
bility problem for conjunctions of TCL-constraints is known 
to be in PTIME [24] . Thus, the entailment problems arising 
in Theorem 8.6 can be trivially solved in EXPTIME (to the 
best of our knowledge, no better bounds are known in the 
literature of TCL) . Therefore, the certainty problem is also 
in EXPTIME. 

[16] shows that conjunctions of atomic RCC-5 constraints 
involving constants that are polygons in U-representation 
(called landmarks in [16]) can be decided in PTIME. There- 
fore, by restricting PCL so that only RCC-5 constraints are 
allowed and constants are given in ^-representation, the cer- 
tainty problem in this case is also in EXPTIME. 

If, instead of TCL or PCL, we consider a first-order con- 
straint language that represents information about rectan- 
gles in Q 2 using rational constants and order or difference 
constraints on the vertices of the rectangles, then from [12] 
it follows that the certainty problem is in coNP. 

9. RELATED WORK 

Incomplete information has been studied in-depth in re- 
lational databases starting with the paper of [11]. More 
recently, papers on uncertain [25, 2] and probabilistic [27] 
database models have reignited interest in this area, partly 
motivated by new applications such as data cleaning, infor- 
mation extraction, sensor networks and scientific databases. 

In the context of the Web, incomplete information has re- 
cently been studied in detail for XML [1, 4]. Related work 
for incomplete information in RDF [8, 10, 3] has been dis- 
cussed in the introduction, so we do not repeat the details 
here. The study of incomplete information in RDF under- 
taken in this paper goes beyond [3] where only the issue of 
OWA for RDF is investigated. Other cases of incomplete in- 
formation in RDF (e.g., blank nodes according to the W3C 
RDF semantics which is different than the SPARQL seman- 
tics as we pointed out in Section 6) can also be investigated 
using an approach similar to ours. Comparing our work with 



[8, 10], we point out that these papers study complementary 
issues in the sense that they concentrate on temporal infor- 
mation of a specific kind only (validity time for a tuple). 
From a technical point of view, the approach of [10] is simi- 
lar to ours since it is based on constraints, but, whereas we 
concentrate on query answering for RDF 1 , [10] concentrates 
more on semantic issues such as temporal graph entailment. 
It is easy to see that RDF 1 can be used to represent in- 
complete temporal information that can be modeled as the 
object of a triple using any of the temporal constraint lan- 
guages of [13]. An example of this situation is when we want 
to represent incomplete information about the time an event 
occurred. This is called user-defined time in the temporal 
database literature and it has not been studied in [8, 10]. 

Recently, some papers have started studying the problem 
of representing probabilistic information in RDF and query- 
ing it using SPARQL [28, 15]. It would be interesting to 
investigate how these approaches can be combined with the 
work presented in this paper as [7] has done in the model of 
probabilistic c-tables. 

In this paper we gave examples of the use of RDF 1 in 
geospatial applications and presented preliminary complex- 
ity results for constraint languages TCL and PCL. Thus, it 
is interesting to compare the expressive power that RDF 1 
gives us to other recent works that use Semantic Web data 
models and languages for geospatial applications. 

When equipped with a constraint language like TCL or 
PCL, RDF 1 goes beyond the proposals of [14] and [18] that 
cannot express incomplete geospatial information. Incom- 
plete geospatial information as it is studied in this paper 
can also be expressed in spatial description logics [19, 17]. 
For efficiency reasons, spatial DL reasoners such as Racer- 
Pro 2 and PelletSpatial 3 have opted for separating spatial 
relations from standard DL axioms as we have done by sep- 
arating graphs and constraints. Since RDF graphs can be 
seen as DL ABoxes with atomic concepts only, all the results 
of this paper can be trivially transferred to the relevant sub- 
sets of spatial DLs and their reasoners. 

10. FUTURE WORK 

Our future work focuses on the following: 1) studying 
the complexity of evaluating various fragments of SPARQL 
over RDF 1 databases like it has been done in [21, 26] for the 
case of SPARQL and RDF, 2) proving tighter complexity 
bounds for certain answer computation for various spatial 
and temporal constraint languages C like the ones used in 
this paper and the papers [10] and [12], and 3) exploring 
other fragments of SPARQL that can be used to define a 
representation system for RDF 1 . 
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APPENDIX 

The Appendix is structured as follows. Section A gives for- 
mal definitions for the concept of well-designed graph pat- 
terns and relevant concepts, such as subsumption for map- 
pings and weak monotonicity, while it presents known results 
for well-designed patterns. Section B provides the proofs for 
the section of Representation Systems (Section 7), while Sec- 
tion C provides the proofs for the section of Certain Answer 
Computation (Section 8). Last, Section D is devoted to ad- 
ditional propositions that, albeit useful in the proofs of other 
propositions and/or theorems, are not that significant to be 
included in the main text of this paper. 

A. WELL-DESIGNED GRAPH PATTERNS 

In this section we present relevant material and known 
results for the fragment of SPARQL corresponding to the 
notion of well-designed graph patterns. These come from 
[21, 3]. 

The next definition introduces the notion of well-designed 
graph patterns. 

Definition A.l (Well-designed Patterns [3]). 
Let P be a graph pattern in the AND-FILTER-OPT 
fragment of SPARQL. Then P is well-designed if (1) P is 
safe, i.e., for every sub-pattern (Pi FILTER R) of P, it 
holds that var(R) C var(Pi), and (2) for every sub-pattern 
P' = (Pi OPT P 2 ) ofP and variable IX, if IX occurs both 
inside P2 and outside P' , then it also occurs in Pi. 

In [21, 3], the authors identified in well-designed graph 
patterns unique and interesting properties that make query 
evaluation more efficient in contrast to what you get with- 
out the syntactic restrictions imposed on the graph patterns 
by Definition A.l above. One of these properties is that 
the fragment of SPARQL graph patterns corresponding to 
well-designed graph patterns is weakly monotone. In the 
following we introduce the notion of weak monotonicity, but 
first we define the notion of subsumption for mappings which 
is needed for weak monotonicity. 

Definition A. 2 (Subsumption of Mappings). Let 
Hi, fj,2 be mappings. We say that i±i is subsumed by Hi, de- 
noted by hi ^ H2, ifdom(fj.i) C dom(H2) and Hi(x) = H2(x) 
for every x £ dom(Hi). Let 0.1,0.2 be set of mappings. We 
say that Oi is subsumed by O2, denoted by Oi C 2 , if 
for every hi £ ^1 there exists mapping H2 £ ^2 such that 
Hi < H2- 

Example A. 3. Let us consider Example 6.2 again. Map- 
ping Hi *s subsumed by mapping Hi, i-e., Hi ^ Mi- 

Informally, when a mapping h subsumes a mapping H ' , 
then h contains additional information to h , i-e., it maps 
additional variables to RDF terms. 

Definition A. 4 (Weak Monotonicity). Let P be a 
graph pattern of SPARQL. P is said to be weakly monotone 
if for every pair G, H of RDF graphs such that G C H, it is 
\P\g E \P\h. 

From [3] we know that every well-designed graph pattern 
is weakly monotone. 

Theorem A. 5 (Theorem 4.3 of [3]). Every well- 
designed graph pattern is weakly monotone. 
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B. PROOFS FOR SECTION 7 
B.l Proof of Proposition 7.5 

Proof for part a) 

The monotonicity property for Qauf follows easily from the 
monotonicity property of graph patterns containing only 
the AND, UNION, and FILTER operators as presented 
in [3, Lemma 3.2]. 

Proof for part b) 

From the same paper, it trivially follows that Qqpt an d 
Qopt are not monotone. 

Proof for part c) 

Now consider a query q = (E, P) G Qauf an d let G, H be 
two RDF graphs such that G C H. According to Defini- 
tion 4.6 of CONSTRUCT for RDF graphs as given in [20] 
we have 

Mg= U W/,(£))n((IUB)xJxT)} (1) 

m£[-P]g 

Mh= U W(f»>(E))n((IUB) x/xT)} (2) 

From the monotonicity property of AUF graph pat- 
terns [3] we have that [P]g Q 1-PJh- Therefore, all map- 
pings fj, appearing in the union expression of formula (1) 
appear also in the union expression of formula (2). There- 
fore, if the sets in formulae (1) and (2) are the same, then 
we shall get the required relation for monotonicity, that is, 
Ma C [ 5 ]h. 

Notice, however, that this is not the case because of the 
renaming functions. According to Definition 4.5 of [20], a re- 
naming function not only depends on the mapping that has 
been constructed from the evaluation of a graph pattern, but 
also on the underlying RDF graph over which the graph pat- 
tern is evaluated. Thus, a renaming function besides renam- 
ing a specific blank node to another one per each mapping 
solution, that renaming has to correspond to a fresh blank 
node not appearing in the underlying RDF graph. There- 
fore, a renaming function used in formula (1) could have 
possibly renamed a blank node to a fresh one regarding G, 
but not a fresh one regarding H, i.e., that blank node could 
have been already in H. 

Hence, in order to have the monotonicity property for 
Qau F , we have to restrict ourselves in CONSTRUCT 
queries without blank nodes in their template. In such a 
case, the renaming functions do not have any effect on the 
templates of CONSTRUCT queries. Hence, the sets in for- 
mulae (1) and (2) are the same for same mappings, and 
thus, 

Mo c Mh- 

Proof for part d) 

Consider a query q = (E, P) G Qwd an d let G, H be two 
RDF graphs such that G C H. According to Definition 4.6 
of CONSTRUCT for RDF graphs as given in [20] we have 

Mg= U M/,(£))n((IUB)xJxT)} (3) 

Mh= U M//M)n((/uB)xJxr)} (4) 



Since the template E does not contain blank nodes, we 
can omit the renaming functions from these expressions and 
get 

Ma= U {KE)n((IUB)xI XT)} (5) 

Mh= (J {y'(E)n((IUB)xI xT)} (6) 

Since P is well-designed, it follows from Theorem A. 5 that 
P is weakly monotone. Therefore, [P]g Q [P]h- Hence, for 
every mapping /i G [PJg there exists mapping p! G \P\h 
such that fj, ■< fx' . This means that y, [x map the common 
variables of their domains to the same RDF terms. Hence, 
if a mapping /i G [P]g produces triple t in Mg, that triple 
is also produced in [g] h by a mapping y! G [P] h such that 
y < /*'. Thus, Iqja C Mh- 

B.2 Proof of Proposition 7.8 

The proof of Proposition 7.8 is straightforward from the 
monotonicity property. Since Q w H we have the following: 

• for every G G Q there exists H G H such that H C G 
and 

• for every H G H there exists G G Q such that G C H. 

Let q G Q. Because Q is monotone, from the first item above 
we have that J^Jh C Iqja for every G G Q and some H £H. 
Notice also that this property holds for every set making up 
\q\g and some Mm € Mh- Similarly, from the second item 
above we have that Mg Q Mh f° r every H G H and some 
G G Q. Notice again that this property holds for every set 
making up \q\u and some Mg € Me- Hence, lq]g and 
JgJ-H are coinitial, that is, 

Mo » M«- 
B.3 Proof of Lemma 7.9 

The proof is similar to the one given in [11, Lemma 4.2]. 

We have to prove that f]M 5 ~ flM" f° r ever y <? £ Qauf- 
Let Q H. Then, from Proposition 7.8 and because of the 
monotonicity property of Qauf? we have that fqjg w 
for every g G Qauf- Thus, for every fqjc G Ms there 
exists an Mn G & M« sucn tnat [?1h g C Mg- So, wc 
have 

f>] 6 = n ^ n ^ n w» = hm«- 

Gee Gee Hew 

To see why f| Mg 5 f| M«c notice that f| Mg 
Gee Gee Gee 

and P| M»g can DC written respectively as 

Gee 

Mgi n Mg 2 n • • • and [?]ii Gi n[9]H G2 n- 

and that 

H« Gj C Mg,. 

Therefore, if an element a; is in f] Mh g > it will be in every 

Gee 

M-f/ G ' an d thus it will be in every Mg^, which proves the 
relation. 

Now, to see why f) Mh g 3 H Mi?i notice that the 
Gee Hen 
relation can be written as 

f]Mn s 3f|M« 



where 

Hg ={H en\ H CG for some G G G}. 

Thus, Hg C H, and therefore, we have that OHg 3 {\H. 
Similarly if q is a monotone query, we have \q\u g Q lljn 

and nitaK, 2 flfolw- 

Therefore, we showed that 

We work similarly to prove 

f>J« 2 f>le 

and get 

f>]e = f>l«- 
B.4 Proof of Proposition 7.13 

Let Jgjo be the pair Di = (Gi, 0) and G' an RDF graph 
such that G' G Rep(Di). From the definition of Rep, there 
exists a valuation t>' such that M£ |= v'(</>) and G' 3 «'(Gi). 
From the assumption that v([g]rj) = [?]»(d> and since 
M£ |= v'(<fi) we S e t 

G' 3 «'(Gi) = «'(I>i) = »'([ g ] D ) = = H 

where if is a new symbol introduced for convenience. Now, 
observe that v'(D) is the RDF graph v'(G) which is an ele- 
ment of Rep(D) since M£ |= v'(<f>). Since also 

= {Ma | G € Rep(D)} 

it turns out that H G lq]R ep (D)- To see this, notice that 

h = [i-(D) 

and that 

v'{D) G Rep(D). 

This proves that for each G' G 7?ep([g]|o) there exists 
an H G MijepfD) such that H C G' . To prove that 
i?ep([(|]D) « [g]ij ep (c) we need to show the same for the 
other direction. 

Let ff' be an RDF graph such that H' G Mflep(D)- Then 
= Hjj f° r some if G Rep(D). From the definition 
of Rep, there exists a valuation v' such that M£ |= v'(<f>) 
and D v'(G) or equivalently D v'(D). From our as- 
sumption that = and M£ |= v'(4>), we have 
[<fl «'(£>) = "'(Hd). Since q belongs to a monotone fragment 
of SPARQL and H D v'(D), we have 

which is equivalent to 

H' D v'(Md). 

Now observe that since M£ |= v'((f>), v'(lq] D ) is an RDF 
graph G' and that G' G Rep(\q\o)- Therefore, we showed 
that for every H' G Mflepfo) there exists an G' G Rep(\q\o) 
such that G' C H'. 
Hence 

Rep(D) ■ 



B.5 Proof of Theorem 7.14 

The proof for item a) can be found in the proof for The- 
orem B.l, while the proof for item b) can be found in the 
proof for Theorem B.2 below. 

Theorem B.l. The triple {T>,Rep, Qauf) is a represen- 
tation system. 

Proof. To prove Theorem B.l, it is sufficient to show 
that for any D = (G, <j>) G T> and any query q = (E, P) G 
Qauf if ls possible to define in such a way that 

AU F 

By Lemma 7.9 it is sufficient to prove that 

RepiMo) W IqjRep(D). (1) 

From Proposition 7.13 it now suffices to prove that for any 
valuation v such that M£ |= v(cj>) it is 

»(bW = hiv(D)- 

From Proposition D.2, the above holds if for any valuation 
v such that M£ |= v(4>), the following holds 

v(IF\d) = IPUd)- 
This is done by induction on the structure of graph pat- 
terns P of Qauf- 

• P is (s,p,o) (base case): 

We shall prove that i>(|-P]d) — I-P]i>(-D)- 
Let p G t>([[P]i5). Then, there exists a conditional map- 
ping p = (v',0') G [PJd such that v(fi') = fj, and 
M £ |= v(6'). 

We now distinguish two cases corresponding to the two 
cases of Definition 6.15 (1): 

(i) In this case o G C. In this case dom(v') does 
not contain any special query variable, hence the 
application of v to fx' leaves v 1 unchanged. In 
other words p = v(p') — v' . 

Now we have two cases corresponding to the two 
sets making up [P]r>- 

If n = v' is an element of the first set, then 

(p'(P),e')eG. 

Since M£ \— v(9'), this is written as 

v(p'(P)) G v(G) 
and because M£ |= v(<f>), this is equivalent to 

v{p!{P)) G v{D). 
Since also v(fi') = fx, we have 
p(P) G v(D). 

and hence 

fx G [PJ„ (D) . 

If fx = v' is an element of the second set then 
0' is 6 A (J EQ o). Since M £ |= v(8'), we have 
M£ |= v(6) and M £ |= v(J EQ o). From the 
second set of the first item of Definition 6.15 that 
makes up [P] d , we have 

((p(s),fx( P ),j),8)eG. 



Since M.c |= v(6), we can apply v to the above 
and get 

v((^s),^p),J))ev(G). 

Since also M £ |= v(<f>) and M £ j= v(J EQ o) we 
get 

(/i(s),M(p)>o) G 
which is equivalently written as 

n{P) e v(D) 

or 

A* G PTUd). 

(ii) In this case oG/UBULUV. Therefore 1/(0) G 
IUBULU[/U(7and 

(j/{P),tf)eG. 

Since M £ |= we can a PPly u t° the previous 

relation and get 

v(ji'(P)) G «(G). 

Because also M£ |= v(<j>), we have 

v{p!{P)) G «(£>). 

The latter fact together with the fact that v(fi') = 
fx gives that 

n{P) e v(D) 

and hence 

At G [P]„ (D) . 

This establishes the fact that v([PJd) C [PJ„( D) . The 
other direction of the proof is similar and goes as fol- 
lows. 

Let n G lPj v(D) . Then, fi(P) G v{D). 

We now distinguish two cases corresponding to the two 

cases of Definition 6.15 (1): 

(i) In this case o G C. Then, dom(fj,) does not contain 
any special query variable. Since /i(P) G v(D), 
there exists conditional triple ((/i(s), A*(p), x), 9) G 
G such that M£ |= v(0) and w(a;) = o. 
Now, we have two cases for x corresponding to the 
two sets making up [P]d in Definition 6.15 (1): 
— x is o. Then, a conditional mapping // = 
(/j,, 9) is an element of the first set, i.e., [i! € 
[P]r>- Since M£ |= we can apply v to 

relation 

M G [P] D 

and get 

w(Ai') G v(lPj D ). 

Because dom(fj.) does not contain any special 
query variable the application of v to // leaves 
fi' unchanged. Therefore, 

«(//) G «([P] D ) 

becomes 

>»€t,([P] D ). 



— a; is J. Then, a conditional mapping \i = 
(/n, 9 A J, EQ o) is an element of the second 
set, i.e., // G [P]d. Since v(J) = v(x) = o, 
we have M£ |= v(J EQ o). Because also 
M£ |= v(9), it holds that M £ \=(9AJ EQ o), 
and hence we can apply v to relation 

A*' G [P] D 

and get 

u(Ai') G 

Because dom(jj,) does not contain any special 
query variable the application of v to // leaves 
// unchanged. Therefore, 

w(Ai') G v([P] D ) 

becomes 

MG^«P] D ). 

(ii) In this case o€ /UBULUF. We have two cases 
to consider. 

If o G IUBuLuVn, then dom(/x) does not contain 
any special query variable. Since fi(P) G «(£>), 
there exists conditional triple (fi(P),9) G G such 
that M £ |= v(0). By the "else" part of Definition 
6.15 the conditional mapping \i = (/i, #) is an 
element of [P]d, that is, 

/*' G [P] D . 

Since M£ |= «(#), we can apply valuation w to 
this relation and get 

«(//) G v([P]o) 

which is equivalent to 

/uGv([P] D ) 

since the application of v to /u' leaves // (and /x) 
unchanged. 

Now if o G V s , there exists a conditional mapping 
\i = (i/, 9) such that // and /i are possibly com- 
patible, dom(i± ) = dom(n), and M£ |= v(#). The 
conditional mapping \x is such that either v' = [i 
or f'(a:) = (j,(x) for every a; G dom(^) \ {o} and 
v'{o) G 17 with v(z/(o)) = /i(o). In either case 
A*(P) G w(-D) implies 

«(/i'(P)) G v(D). 

from which eliminating v we get 

{p'(P),0)eG 

or equivalently 

A*' G [P] d . 

Applying v to the last relation we have that 
«(//) G v([P] ) 

and thus 

^"(IPlo). 

• Inductive step: 



- P is Pj AND P 2 . 

We have «([Pi]d) = [PiJ„( 0) and i;([P 2 ] D ) = 
[P 2 Jd(.d) from the inductive hypothesis. We will 
prove that «([Pi AND P 2 \d) = [Pi AND Pi\ v(D) . 

Let fi G w([Pi ^A^D P 2 ]d). Therefore there 
exists a conditional mapping fj! — (y 1 ,9') G 
[Pi AND P 2 ]d such that fi = v(fi') and M c \= 
v{#). Because [Pi AND P 2 \ D = [Pi] D N[P 2 ] D , 
there exist possibly compatible conditional map- 
pings fj,\ — (v'xjO'i) and p! 2 — (v' 2 ,9 2 ) such that 
[i = /x'iN^, /ii G [Pi]c, and p 2 G [P 2 ]c Because 
of Proposition D.l and the fact that Mc |= v(6'), 
we have 

fi = v(n') = vip'iM^) = v(n[)ixiv(n 2 ). 

Since M c \= v(9') it also holds Mc \= v(9[) and 
M c |= v(0 2 ). Therefore, vQn'i) € «([Pi]d) and 
v(fj,' 2 ) G w([P 2 ]d)- Notice also that because and 
/i 2 are possibly compatible, w(/x'i) and v(fx 2 ) are 
compatible. Therefore, 

e «([i , i]u)N«([i , 2 ]i 3 ) 

which is equivalent to 

/i G «([PiJ )Ni,([P 2 ] ). 

From the equalities of the inductive hypothesis, we 
now get 

([Pi].(D)M[P 2 l,( )) 

which is equivalent to 

H G [Pi AND P 2 j v(D) . 

This proof establishes that 

«([Pi AND P 2 \ D ) C [Pi AiVL> P 2 ]„ (D) . 

The other direction of the proof is similar and goes 
as follows. 

Let jj, be a mapping such that /i G 
[Pi AiVP>P 2 J„ (D) . ThenpG ([PiJ„ (D) N[P 2 J„ (0) ) , 
which due to the inductive hypothesis gives us 

/* e «([Pi] )m«([P 2 ] ). 

Therefore, there exist compatible mappings /ii G 
«([Pi]d) and p 2 G w([P 2 ]o) such that /i = 
/iiN/x 2 . Thus, there exist conditional mappings 
= K^'i) G and fj,' 2 = (i^.flj) G [P 2 ]d 

such that /ii = w(/i'i), p 2 = v(m 2 ), Mr ]= t>(#i) 
and M£ |= v(9' 2 ). Notice also that and fi' 2 are 
possibly compatible. 

From Proposition D.l and the fact that fi = /UiXlp 2 , 
we have 

Ii = /iiM^ 2 = w(//i)Mw(/i2) = vOu'iMpa)- 

Because pi G [Pi]d, p 2 G I^Jd, and p'i,p 2 are 
possibly compatible, we have 

M1XM2 G [Pi] D N[P 2 ] . 

Now let /i' = (v',9') be a conditional mapping 
such that p! = piMpj- Since M £ |= v(#i) and 



Mc |= v(9' 2 ), the definition of join of two compat- 
ible mappings gives us Mc \= v(9'). Therefore we 
can apply the valuation v to // and get 

v<J*') G «([Pi] D N[P 2 ] D ). 

From this and the fact that v(f/) = v(fi' 1 t>4n' 2 ) = p 
we get 

p G w([Pi P 2 ] D ). 

- P is Pi UNION P 2 . 

We have «([Pi]d) = [Pi]„(c) and v([P 2 ]c) = 
[P 2 ]u(c) from the inductive hypothesis. We will 
prove that 

v([Pi [/iV/OiV P 2 ] D ) = [Pi [W/CW P 2 J„ (0) . 

A mapping p is in [Pi UNION P 2 \ v {d) iff p G 
[Pi]d(d) U [P 2 l„(o), which due to the inductive hy- 
pothesis is equivalent to p G w([Pi]c) U w([P 2 ]d), 
which can be seen to be equivalent top € «([Pi]cU 
[P 2 ]d), which is equivalent to 

p G «([Pi UNION P 2 j D ). 

- P is Pi FILTER R. 

We have u([Pi]d) = [Pi]«(d) from the inductive 
hypothesis. We will prove that 

FILTER Rj D ) = [Pi FILTER Rj v(D) 

Without loss of generality, we give the proof only 
for the case of filters that are atomic /^-constraints 
(Definition 6.18). Let p be in [Pi FILTER R\ v{D) . 
By definition, this is equivalent to p G [Pi]„(d) 
and p \= R. From the inductive hypothesis, we 
now have 

pG V ([Pi] ). 

Thus, there exists a conditional mapping p' = 
(V,6>') G [Pi]d such that v(p') = p and M £ |= 
v{9'). 

Let now pi = (v',9i) be a conditional mapping 
with 9\ = 9' A v'(R). Because /i |= R, we have 
M £ |= n(R). Therefore M £ (= i>(V(P)) since 
= /i. Now notice that because |= 
v\v'{R)) and M £ j= v{9'), we have M £ |= v{9i). 
Therefore v(/j,i) is well defined and we have v(/j,i) = 
v(p') = fi. 

The way pi and // have been defined above, to- 
gether with the definition of the evaluation of FIL- 
TER graph patterns give us 

pi G [Pi FILTER Rj D . 

We can apply valuation v to 

Mi G [Pi FILTER Rj D 

and get 

v(fii) G w([Pi FILTER Rj D ) 
which is equivalent to p G w([Pi FILTER RJ D ). 

This proof establishes that 

[Pi FILTER Rj v(D) C w([Pi FILTER RJ D ). 

The other direction of the proof is similar and goes 
as follows. 



Let fi be a mapping in v(fPi FILTER Rj D ). 
Then there exists a conditional mapping fi± — 
(vi,0i) G [Pi FILTER Rj D such that vfai) = n 
and Mc \= v(9i). Therefore, from the definition of 
FILTER evaluation there exists a conditional map- 
ping 1x2 = {vi,9 2 ) such that At2 e [Pi Id, where 
0i = 2 A ^i(P). Since M c \= v(0i), then it holds 
that M £ |= v(0 2 ) and M £ |= v(vi(R)). Thus 

w(/i2)=w(Ati) = At€u([Pi] D ). 

Now using the inductive hypothesis, we have 

A* G [Pi]v(r>)- 

Because M £ |= «(^i(P)) and fj, = w(/ui), we have 
~M.£ \= fi(R). Thus we also have fi \= R. Hence 

At G [Pi FILTER Rj v(D) . 

□ 

Theorem B.2. The tripZe (T>,Rep,Qwo) is a represen- 
tation system. 

Proof. The proof for Theorem B.2 is the same with the 
proof for Theorem B.l and differs only in the inductive step 
for the OPTIONAL operator. Thus, in this case, 

P is Pi OPT P 2 . 

We have «([Pi]d) = [Pi]„ (D) and = [P 2 ]„( D) 

from the inductive hypothesis. We need to prove 
«([Pi OPT P 2 \d) = [Pi OPT P 2 \ v{D) or equivalently 

«([Pi OPTP 2 j D ) = ([Pi], (D )M[P 2 ]„ (D) )U([Pi]„ (D) \[P2]„ (D) ). 

(1) 

Let /i € «([Pi OPT P2]d). There exists conditional map- 
ping fj,' = {v',9') G [Pi OPT P 2 \ D such that fj, = v(fx') 
and M £ j= v{0'). Since [Pi OPT P 2 j D = [Pil ^[P 2 ] = 
([Pi] D N[P 2 ] D ) U ([Pi] c \ [P 2 ] D ), 

A*' G ([Pi] D M[P 2 ] D ) or // G ([Pi] D \ [P 2 ] D ). 

For the first case, i.e., At' € ([Pi]dX[P2]d) the proof is 
the same as in the proof of Theorem B.l, hence we finally 
get that 

A* G ([Pi]„( D) N[P 2 ]„ (0) ) 

and thus from Formula (1) we have 

AtG [Pi OPT P 2 ] V(D) . 

For the second case, i.e., // G ([PiJd \ [P2]d), and the 
definition of difference for sets of conditional mappings we 
distinguish two cases: 

1. \j! € [Pi]d and for all a* 2 G [P 2 ]d, A*' an d A* 2 are not 
compatible. Since M £ |= v(0'), we can apply valuation 
v to fj,' and get 

Since also every conditional mapping fi 2 of [P2]n is not 
compatible to [i! — and hence not possibly compatible 
to /i' — v(aO is not compatible to every mapping fj," € 
w([P 2 ]d). Therefore, 

v(fi')€(v(lP 1 } D )\v(lP2j D )) 

which from our hypothesis is equivalent to 

At G ([PiJ„ (D) \ [P 2 L( 0) ). 



Hence, from Formula (1) we have 

A 4 G [Pi OPT P 2 ] V(D) . 

2. n' is the conditional mapping (v',9 1 ) and there exists 
conditional mapping //' = (y' , 9) € [Pi]d such that 

• fi" is not compatible to some mappings of [P 2 ]d 
and 

• for the rest mappings a*; = (1^, 0j) G [P 2 ]d, A*" an d 
fii are possibly compatible and 

0' is A ^ D V -.(At" (a;) EQ (m{x))^ 

for x £ dom(ii") n dom(jj,i) n V 3 . 

Since M £ |= v(9'), we can apply valuation v to At' and 
get 

At = «(//) G v([Pi] D \ [P 2 ] D ) 
which can be written as 

/i = «(A»')G W[Pi]o)\«([P2]o))- 

To see this, notice that the above relation holds if and 
only if v(fx') G ^([Pi]u) and it is not compatible to 
every mapping v(fi 2 ) of v([P 2 ].d). Since M £ |= v(9') it 
holds M £ |= v{9) and thus v{fx') = v{fx") G v([Pi] D ). 
Let us now take a mapping a*2 in [PzJd- Then, a) ei- 
ther At", and consequently a*', is not compatible to a*2, 
or b) At", and consequently a*', is possible compatible 
to a*2- For the first case v(n') is also not compatible to 
v(/j, 2 )- For the second case v(fi') is also not compati- 
ble to w(a*2). To see this, notice that v(fi') and v(fi 2 ) 
become compatible only when v(/j,'(x)) — v(fi 2 (x)) 
for x G dom(n') n dom([i 2 ). In such cases, however, 
M £ Y= 9' and thus «(//) £ v([PiJ D ). 
Continuing the proof, from our hypothesis, relation 

H = v{n')€{v{{P 1 \ D )\v{{P 2 \ D )) 

now becomes 

H € ([PiL (0) \ IP 2 Ud)) 

and thus from Formula (1) we get 

At € [Pi OPTP 2 ]„ (D) . 

This proves that w([Pi OPT P 2 \ D ) C [Pi OPT P 2 ]„ ( d)- 
The other direction of the proof is similar. □ 

C. PROOFS FOR SECTION 8 
C.l Proof of Lemma 8.3 

To show that f)Rep(D) = f]Rep((D EQ )*) we will first 
prove that 

f]Rep(D)cf]Rep((D EQ n 

Let t be an RDF triple such that t £ f]Rep((D EQ y). 
Then, by definition of Rep we get 

t <£ (^{FI | there exists valuation v such that 

M £ \= v{<t>) and H D «((G EQ )*) }. 

Therefore, there exists valuation v such that M £ |= v(<f>) 
and t £ v({G EQ )*), and thus 



• either there is no conditional triple (t',0') G (G EQ )* 
such that M £ j= v(0'), that is, M c Y= 

• or all conditional triples (t',0') G (G EQ )* such that 
M £ |= v(0') are such that v(t') =£ t. 

Observe now that for conditional triples in (G EQ )*, 0' can 
be written as \J 01 So, if (t',0') G (G EQ )*, then (t',0<) G 

i 

G EQ . Therefore, there will be a conditional triple (t",0l) G 
G, such that t' and t" possibly differ in their object position. 
In the following we construct G and we show that t £ v(G) 
for this particular v. 

For the first case above, and since M £ \f= v(9') we have 
that M £ y= v(0' t ) for every 0[, and thus such triples are 
dropped during application of valuation v to G. Hence, if it 
was the case that t G f)Rep(D), it would be so, only from 
the second case above. 

Consider now the second case above and a triple (t',0') G 
(G EQ )*. Since M £ j= v(0') and (t',0-) G G EQ , then some 
(or even all) 0\ would be such that M £ |= v(0' i ). 

Let us now construct the conditional graph G from G E( ^. 
Since (t' ,0'i) G G ECJ , then there exists conditional triple 
(t" , 0'i) G G such that t' and t" possibly differ in their object 
position. Let t' be the e-triple (s,p,o). Then: 

1. If o G G, then either t" would be the same with t' , or 
it would have in its object position an e-literal J such 
that <f> |= J EQ o. 

2. If o ^ G, then t' and t" would be the same. 

Let us now apply valuation v to G. Notice that v(G) con- 
tains only RDF triples coming from conditional triples with 
a condition such that v(0) is true. Thus, we could focus 
only on the conditional triples of G with such conditions 
(it is clear from above, that such conditional triples do ex- 
ist). To construct the RDF graph v(G) it suffices to consider 
the two items above when applying v to a conditional triple 
(t",0'i)oiG. 

According to the second item and since v(t') 7^ t (see 
second case above), we have that v(t") ^ t as well. As for 
the first item, if t! — t" , then clearly we have v(t") =fc t, 
since v(t') / t. Otherwise, t" would be the triple (s,p,J) 
such that 4> \= J EQ o. In such a case, the application of v 
to t! would leave t! unchanged, thus the RDF triple t would 
contain in the object position a literal from G and one that 
would be different from o (this is because we are considering 
the second case for which v(t') =fct). Since also <f> \= J EQ o, 
then every valuation v' that makes v'(4>) true it should make 
v'(J, EQ 0) true as well. Thus, such valuations would map 
the e-literal _l to the constant o. Since the valuation v we 
consider is such a valuation, it maps J to the constant o. 
Thus, again v(t") / t. 

Therefore, we showed that v(G) cannot contain triple t 
or equivalently that t ^ v(G). Hence, from the definition of 
Rep we have 

t<£f)Rep(D) 

which proves that 

f]Rep(D)cf]Rep((D EQ n 
The other direction of the proof for showing 
C\Rep((D ECi r)cf)Rep(D) 

is similar. 



C.2 Proof of Theorem 8.4 

Notice that the certain answer for q over D is the set 

f)hlRep(D)- 

From the Representation Theorem and since q G Qauf it 
suffices to show that the algorithm computes the set 

(~)Rep(MD). 

Notice that equation 

r\lliRe P (D)=f]Rep(lqj D ) 

is a logical consequence of Definition 7.3 for the identity 
query. 

Having Lemma 8.3 it now suffices to prove that the given 
algorithm computes the set 

f>ep(((M D ) EQ n 
or, using the notation of Theorem 8.4, set 
P|i?ep(((D 9 ) EQ r). 

The first step of the algorithm evaluates q over D, that is, 
it computes D q — (G q ,<f>), while the second step computes 
the EQ-completed form of D q , that is, (D,) E< ^, and then its 
normalized form, ((-D 9 ) E )* • 

It remains to show that step three computes exactly the 
intersection over the RDF graphs in Rep(((D q ) EQ )*). 

Consider the set f) Rep(((D q ) EC> )*) or equivalently the set 

H | there exists valuation v such that 

M c |= v(cj>) and H D v(((D q ) EQ y) }. 

An RDF triple t belongs to the above set iff for all valua- 
tions v such that \= v(<f>), it holds 

t G v(((D q f*y). 

This is equivalent to requiring that a conditional triple 
(t',0') exists in H q such that M £ |= v(0') and t = v(t') 
for all valuations v such that M £ |= v(<p). 

The first condition, that is, requiring that |= v(0') 
holds for all valuations v such that M£ |= v(4>), is equivalent 
to requiring that (f> \— 6' holds, a requirement that step three 
imposes. 

As for the second condition, equation t = v(t') holds for 
any valuation v such that M £ |= v((f>) iff t' respects the 
following two cases: 

• it does not contain any e-literal in the object position, 

• it does contain an e-literal _l and all valuations v above 
map J to the same constant c G G, which t has it in 
its object position. 

Since step three selects all conditional triples (t',9) of H q 
such that (f> \— 6 and o £ U, the first case above is satis- 
fied. The second case above is out of question: H q does not 
contain such a triple since all such e-literals have already 
been substituted by the respective constant c G G such that 
4> |= J EQ c. 

Thus, step three computes exactly the set 

f>e P (((^) EQ n- 



C. 3 Proof of Theorem 8.6 

Working similar to Proof C.2, it suffices to show that an 
RDF triple t is in the certain answer of q G Qauf over D, 
that is, t € p| Repdqjo), if and only if the following formula 
is valid: 

(VJ)W(j)De(t 1? ,A-i)) (i) 

Let us now construct formula 0(t, q, D, _1) given the eval- 
uation of q over D, i.e., = (G' , cj>). Recall that formula 
Q(t, q, D, _1) is a disjunction of constraints 9i for each condi- 
tional triple (t'i, 9'i) G G' such that if t and t\ have the same 
subject and predicate, is 

• 9'i if they agree in the object position as well, 

• 9[ A (J EQ o) if t has the constant o G C at the object 
position and t'i has the e-literal J G 1/ at the object 
position. 

In every other case (i.e., t and t'i do not agree in the subject 
and predicate or agree but the object of t is not a constant 
from C or the object of t' t is not an e-literal from U) no 
constraint 9i is generated for those conditional triples and 
yet 0(t, q, D,_l) is taken to be false. Therefore, formula (1) 
is either unsatisfiable (we assume that the global constraint 
is always satisfiable) or of the form 

(VJ)(0(-l)D0iV...V0 fc ) (2) 

Consider now an RDF triple t ^ f] Repdqjn)- Then there 
exists valuation v such that M £ |= v{cj>) and t ^ V (G'). 
Therefore, G' contains conditional triples {t',9') such that 
either 

• M £ Y= v{9') or 

• M £ |= v(9') and t / v(t'). 

Considering the first case above and since M £ Y= v (9'), 
formula (2) would be unsatisfiable. To see this, notice that 
M £ Y= v{6') implies M £ ^ v{9[) and M £ ^ w(^A(J EQ o)) 
and thus the disjunction OiV . . . V 9k in formula (2) is always 
false, and hence the whole formula is unsatisfiable. 

To prove our result (i.e., that formula (1) is unsatisfiable 
for the specific RDF triple we considered), we have to show 
that formula (2) is unsatisfiable as well for the case in which 
M £ |= v(9') and t =fc v(t') (the second case above). Notice 
that t 7^ v(t') implies one of the following cases: 

• t and t' do not agree in the subject or predicate posi- 
tion, or 

• if they do, either they do not agree in the object po- 
sition, or their objects are not of the proper kind (i.e., 
the object of t is a constant from C and the object of 
t' is an e-literal from U), or if they are, then valuation 
v does not map that e-literal to that constant. 

From the first case, no constraint 9i is included in for- 
mula (2). As for the second case, either no constraint 9i is 
generated (the case in which they also differ in the object 
position) or 9t is 9'i A (J, EQ o)). As we pointed above, since 
valuation v does not map the e-literal J to the constant o, 
then 9i is false. Hence, formula (2) is unsatisfiable as well. 

The other direction of the proof is similar. 

D. ADDITIONAL PROPOSITIONS 

The next proposition shows that the result of applying a 
valuation to the join of two possibly compatible conditional 



mappings is the same as applying first the valuation to the 
conditional mappings and then computing their join as in 
standard RDF. 

Proposition D.I. Let v : U — > C be a valuation and 
fix — (fi,9i), fi 2 = (v 2 ,9 2 ) be two possibly compatible con- 
ditional mappings such that )J,iMfj, 2 = (^3,^3). Then 

w(/ttiN/i 2 ) = v(pi)Mv(p 2 ) 

whenever these mappings are defined (i.e., whenever Mx |= 
v(9 3 ) and therefore M c |= v(9 1 ) and M c |= v{9 2 )). 

The proof follows easily from the definition of join for 
conditional mappings and is omitted. 

Proposition D.2. Let D = (G, (p) be an RDF 1 database, 
q = (E,P) a CONSTRUCT query without blank nodes 
in E, and v a valuation such that Mc \= v((j>). Then, 
v(IP}d) = [-Pl«(D) implies v(lq] D ) = 

Proof. Let be the RDF' database D' = (G',<j>) 

where 

G'= U {(t,9)\t€(»(ME))n((lUB)xIxT))}. 

Then v(lqj D ) is the RDF graph v(D') where 

v(D')= y {v(t) \ t e (KUE)) n ((I U B) x I x T)) 

and = [y, 9) such that M£ |= v(9)}. 

(1) 

Likewise, let [g]„( D ) be the RDF graph H. According to 
the definition of the evaluation of CONSTRUCT queries 
on RDF graphs [20], H is the following set: 

H= U {KU(E))n((IUB)xIxT)} (2) 

Me[P]„ (D) 

To prove our proposition, we have to show that H = 
v(D'). 

Let t G H be an RDF triple. Then, there exists a mapping 
H G [-PIU(.d) such that 

t£W/,(£))n((/UB)x/xT)) (3) 

From our assumption that wdfli)) = we have 

Therefore, there exists a conditional mapping // = (y' ,9') G 
[P]d such that M £ |= v(9') and /i = v(fi'). Since ^ = v(n') 
relation (3) is written as 

te{v(ii'(U(E)))n((luB)xlxT)) 

which is equivalent to the following 

tev{ f i'(UE))n((IUB)xIxT)). (4) 
From (1) and since M £ |= v(9') and // G I-P]c, we have 

v (**'(/„(£)) n((/UB)xJx T)) C v(D'). 
Because also of relation (4) we get 
t G v(D'). 

Hence, we showed that every triple of H is a triple of v(D'). 

The other direction of the proof is similar and goes as 
follows. 



Let t G v(D'), then there exists conditional mapping /i = 
0, 6») G [PJd and conditional triple t c = (t',0) G G' such 
that M £ |= v{9), v(t c ) = v{t') = t. From (1) we then have 

t' e (fi(UE)) n ((I U B) x I x T)) . (5) 

Since Mc |= v(6), v(n) is defined and thus we have 

e v(lPj D ) 

which from our assumption that v([P]d) = [PJufo) we get 

fl' = V((J.) G IPjv(D)- 

Thus, applying valuation v to (5) we get 

v(t')ev(piUE))n((IUB)xIxT)) 
which is equivalent to 

(£(»WW£)))n((/uB)x/xr)). 

Since jx = v(n), the above relation becomes 

te (»'(U(E))n((iuB)xixT)). 

From the above and because of (2) and /x' G [P]«(d) we have 

* G (/*'(/#.(£)) H ((I U B) x I x T)) C H 
and hence 

t G H. 

□ 



